Celery: Celery Worker crashing after first task with TypeError: 'NoneType' object is not callable

Created on 24 Nov 2016  ·  54Comments  ·  Source: celery/celery

Checklist

  • [X] I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
software -> celery:4.0.0 (latentcall) kombu:4.0.0 py:3.4.3
            billiard:3.5.0.2 py-amqp:2.1.1
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> celery.loaders.default.Loader
settings -> transport:amqp results:disabled
  • [X] I have verified that the issue exists against the master branch of Celery.
    Yes I've tested and it behaves the same using master.

Steps to reproduce

Not exactly sure, because other machines with the same specs and requirements are working.

Expected behavior

Should consume tasks.

Actual behavior

A task is accepted, then a traceback is logged, then the worker reconnects to the broker for some reason. This repeats forever:

[2016-11-23 23:09:00,468: INFO/MainProcess] Connected to amqp://user:**@10.136.131.6:5672//
[2016-11-23 23:09:00,484: INFO/MainProcess] mingle: searching for neighbors
[2016-11-23 23:09:01,921: INFO/MainProcess] mingle: sync with 1 nodes
[2016-11-23 23:09:01,922: INFO/MainProcess] mingle: sync complete
[2016-11-23 23:09:01,970: INFO/MainProcess] Received task: tasks.calculate_user_running_total[ddd103af-d527-4564-83f8-96b747767a0c]
[2016-11-23 23:09:01,972: CRITICAL/MainProcess] Unrecoverable error: TypeError("'NoneType' object is not callable",)
Traceback (most recent call last):
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 370, in start
    return self.obj.start()
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 584, in start
    c.loop(*c.loop_args())
  File "./venv/lib/python3.4/site-packages/celery/worker/loops.py", line 47, in asynloop
    consumer.consume()
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 470, in consume
    self._basic_consume(T, no_ack=no_ack, nowait=False)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 591, in _basic_consume
    no_ack=no_ack, nowait=nowait)
  File "./venv/lib/python3.4/site-packages/kombu/entity.py", line 737, in consume
    arguments=self.consumer_arguments)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1578, in basic_consume
    wait=None if nowait else spec.Basic.ConsumeOk,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 73, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 93, in wait
    self.connection.drain_events(timeout=timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 464, in drain_events
    return self.blocking_read(timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 469, in blocking_read
    return self.on_inbound_frame(frame)
  File "./venv/lib/python3.4/site-packages/amqp/method_framing.py", line 88, in on_frame
    callback(channel, msg.frame_method, msg.frame_args, msg)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 473, in on_inbound_method
    method_sig, payload, content,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 142, in dispatch_method
    listener(*args)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1613, in _on_basic_deliver
    fun(msg)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 617, in _receive_callback
    return on_m(message) if on_m else self.receive(decoded, message)
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 558, in on_task_received
    callbacks,
  File "./venv/lib/python3.4/site-packages/celery/worker/strategy.py", line 145, in task_message_handler
    handle(req)
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 221, in _process_task_sem
    return self._quick_acquire(self._process_task, req)
  File "./venv/lib/python3.4/site-packages/kombu/async/semaphore.py", line 62, in acquire
    callback(*partial_args, **partial_kwargs)
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 226, in _process_task
    req.execute_using_pool(self.pool)
  File "./venv/lib/python3.4/site-packages/celery/worker/request.py", line 532, in execute_using_pool
    correlation_id=task_id,
  File "./venv/lib/python3.4/site-packages/celery/concurrency/base.py", line 155, in apply_async
    **options)
  File "./venv/lib/python3.4/site-packages/billiard/pool.py", line 1487, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
TypeError: 'NoneType' object is not callable

The above lines are keep repeating every few seconds and no tasks are consumed from the queue.

Most helpful comment

👍 for this. Getting the very same issue, even on 4.0.1

All 54 comments

The error is repeating in the log because the Celery worker daemon is crashing, so systemd restarts it.

@ask, self._quick_put is somehow not defined. Should billiard check for a None value before calling, catch the exception, or should self._quick_put never be None?

When I change billiard/pool.py#L1483 to if self.threads or self._quick_put is None: Celery does not crash anymore but for some reason the workers never process any tasks.

More verbose output with logging level DEBUG:

[2016-11-27 14:48:09,875: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2016-11-27 14:48:09,877: DEBUG/MainProcess] | Worker: Building graph...
[2016-11-27 14:48:09,878: DEBUG/MainProcess] | Worker: New boot order: {Timer, Hub, Pool, Autoscaler, StateDB, Beat, Consumer}
[2016-11-27 14:48:09,889: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2016-11-27 14:48:09,889: DEBUG/MainProcess] | Consumer: Building graph...
[2016-11-27 14:48:09,898: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Agent, Events, Mingle, Tasks, Control, Gossip, Heart, event loop}
[2016-11-27 14:48:09,908: DEBUG/MainProcess] | Worker: Starting Hub
[2016-11-27 14:48:09,908: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:09,908: DEBUG/MainProcess] | Worker: Starting Pool
[2016-11-27 14:48:09,998: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:09,999: DEBUG/MainProcess] | Worker: Starting Consumer
[2016-11-27 14:48:10,000: DEBUG/MainProcess] | Consumer: Starting Connection
[2016-11-27 14:48:10,016: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.
  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,018: INFO/MainProcess] Connected to amqp://user:**@10.136.131.6:5672//
[2016-11-27 14:48:10,018: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:10,019: DEBUG/MainProcess] | Consumer: Starting Events
[2016-11-27 14:48:10,031: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,034: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:10,034: DEBUG/MainProcess] | Consumer: Starting Mingle
[2016-11-27 14:48:10,035: INFO/MainProcess] mingle: searching for neighbors
[2016-11-27 14:48:10,036: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:10,041: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:10,061: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,063: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:10,064: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,189: INFO/MainProcess] mingle: sync with 3 nodes
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker03
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker02
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker01
[2016-11-27 14:48:11,190: INFO/MainProcess] mingle: sync complete
[2016-11-27 14:48:11,191: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,191: DEBUG/MainProcess] | Consumer: Starting Tasks
[2016-11-27 14:48:11,244: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,244: DEBUG/MainProcess] | Consumer: Starting Control
[2016-11-27 14:48:11,244: DEBUG/MainProcess] using channel_id: 2
[2016-11-27 14:48:11,246: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,251: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,251: DEBUG/MainProcess] | Consumer: Starting Gossip
[2016-11-27 14:48:11,252: DEBUG/MainProcess] using channel_id: 3
[2016-11-27 14:48:11,253: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,257: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,258: DEBUG/MainProcess] | Consumer: Starting Heart
[2016-11-27 14:48:11,259: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:11,260: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,261: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,261: DEBUG/MainProcess] | Consumer: Starting event loop
[2016-11-27 14:48:11,264: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[0eba267c-72e4-40ea-91dd-a1a7ab17c514]
[2016-11-27 14:48:11,265: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7ff469300950> (args:('wakatime.tasks.cache_coding_activity', '0eba267c-72e4-40ea-91dd-a1a7ab17c514', {'argsrepr': '()', 'task': 'wakatime.tasks.cache_coding_activity', 'lang': 'py', 'parent_id': '81f0c7ce-1396-496f-bf64-ae243736c845', 'timelimit': [None, None], 'root_id': '128647cc-f558-4b7d-bafc-338d186b5cfa', 'reply_to': 'e3c2b067-a058-3aa0-a3a1-384d4b917bbf', 'retries': 0, 'expires': None, 'delivery_info': {'exchange': '', 'priority': None, 'routing_key': 'cache', 'redelivered': True}, 'id': '0eba267c-72e4-40ea-91dd-a1a7ab17c514', 'correlation_id': '0eba267c-72e4-40ea-91dd-a1a7ab17c514', 'group': None, 'eta': None, 'kwargsrepr': "{'cache_projects': True, 'timeout': 15, 'user_id': UUID('d9c69ce0-f194-45a6-83cf-98f931fca8aa'), 'writes_only': False}", 'origin': 'gen3021@worker02'}, '[[], {"cache_projects": true, "timeout": 15, "user_id": "d9c69ce0-f194-45a6-83cf-98f931fca8aa", "writes_only": false}, {"callbacks": null, "chain": null, "chord": null, "errbacks": null}]', 'application/json', 'utf-8') kwargs:{})
[2016-11-27 14:48:11,266: CRITICAL/MainProcess] Unrecoverable error: TypeError("'NoneType' object is not callable",)
Traceback (most recent call last):
  File "./venv/src/celery/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "./venv/src/celery/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/src/celery/celery/bootsteps.py", line 370, in start
    return self.obj.start()
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "./venv/src/celery/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 593, in start
    c.loop(*c.loop_args())
  File "./venv/src/celery/celery/worker/loops.py", line 47, in asynloop
    consumer.consume()
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 470, in consume
    self._basic_consume(T, no_ack=no_ack, nowait=False)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 591, in _basic_consume
    no_ack=no_ack, nowait=nowait)
  File "./venv/lib/python3.4/site-packages/kombu/entity.py", line 737, in consume
    arguments=self.consumer_arguments)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1578, in basic_consume
    wait=None if nowait else spec.Basic.ConsumeOk,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 73, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 93, in wait
    self.connection.drain_events(timeout=timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 464, in drain_events
    return self.blocking_read(timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 469, in blocking_read
    return self.on_inbound_frame(frame)
  File "./venv/lib/python3.4/site-packages/amqp/method_framing.py", line 88, in on_frame
    callback(channel, msg.frame_method, msg.frame_args, msg)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 473, in on_inbound_method
    method_sig, payload, content,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 142, in dispatch_method
    listener(*args)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1613, in _on_basic_deliver
    fun(msg)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 617, in _receive_callback
    return on_m(message) if on_m else self.receive(decoded, message)
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 567, in on_task_received
    callbacks,
  File "./venv/src/celery/celery/worker/strategy.py", line 145, in task_message_handler
    handle(req)
  File "./venv/src/celery/celery/worker/worker.py", line 221, in _process_task_sem
    return self._quick_acquire(self._process_task, req)
  File "./venv/lib/python3.4/site-packages/kombu/async/semaphore.py", line 62, in acquire
    callback(*partial_args, **partial_kwargs)
  File "./venv/src/celery/celery/worker/worker.py", line 226, in _process_task
    req.execute_using_pool(self.pool)
  File "./venv/src/celery/celery/worker/request.py", line 532, in execute_using_pool
    correlation_id=task_id,
  File "./venv/src/celery/celery/concurrency/base.py", line 155, in apply_async
    **options)
  File "./venv/lib/python3.4/site-packages/billiard/pool.py", line 1487, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
TypeError: 'NoneType' object is not callable
[2016-11-27 14:48:11,273: DEBUG/MainProcess] | Worker: Closing Hub...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Closing Pool...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Closing Consumer...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Stopping Consumer...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Consumer: Closing Connection...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Events...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Mingle...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Tasks...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Control...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Gossip...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Closing Heart...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Closing event loop...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Stopping event loop...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Stopping Heart...
[2016-11-27 14:48:11,277: DEBUG/MainProcess] | Consumer: Stopping Gossip...
[2016-11-27 14:48:11,278: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[f786fc75-0518-4893-8988-ff7f063edd12]
[2016-11-27 14:48:11,278: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7ff469300950> (args:('wakatime.tasks.cache_coding_activity', 'f786fc75-0518-4893-8988-ff7f063edd12', {'argsrepr': '()', 'task': 'wakatime.tasks.cache_coding_activity', 'lang': 'py', 'parent_id': '81f0c7ce-1396-496f-bf64-ae243736c845', 'timelimit': [None, None], 'root_id': '128647cc-f558-4b7d-bafc-338d186b5cfa', 'reply_to': 'e3c2b067-a058-3aa0-a3a1-384d4b917bbf', 'retries': 0, 'expires': None, 'delivery_info': {'exchange': '', 'priority': None, 'routing_key': 'cache', 'redelivered': True}, 'id': 'f786fc75-0518-4893-8988-ff7f063edd12', 'correlation_id': 'f786fc75-0518-4893-8988-ff7f063edd12', 'group': None, 'eta': None, 'kwargsrepr': "{'cache_projects': True, 'timeout': 15, 'user_id': UUID('7056644f-2564-4074-b89e-631973879f44'), 'writes_only': False}", 'origin': 'gen3021@worker02'}, '[[], {"cache_projects": true, "timeout": 15, "user_id": "7056644f-2564-4074-b89e-631973879f44", "writes_only": false}, {"callbacks": null, "chain": null, "chord": null, "errbacks": null}]', 'application/json', 'utf-8') kwargs:{})
[2016-11-27 14:48:11,279: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[d5c8dc57-116c-467d-9924-e2999280c2f8]
[2016-11-27 14:48:11,280: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[460ef864-e482-4b0f-8580-d0095750bae6]
[2016-11-27 14:48:11,281: DEBUG/MainProcess] Closed channel #3
[2016-11-27 14:48:11,281: DEBUG/MainProcess] | Consumer: Stopping Control...
[2016-11-27 14:48:11,283: DEBUG/MainProcess] Closed channel #2
[2016-11-27 14:48:11,283: DEBUG/MainProcess] | Consumer: Stopping Tasks...
[2016-11-27 14:48:11,284: DEBUG/MainProcess] Canceling task consumer...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Mingle...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Events...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Connection...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Worker: Stopping Pool...
[2016-11-27 14:48:12,800: DEBUG/MainProcess] result handler: all workers terminated
[2016-11-27 14:48:12,801: DEBUG/MainProcess] | Worker: Stopping Hub...
[2016-11-27 14:48:12,801: DEBUG/MainProcess] | Consumer: Shutdown Heart...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Gossip...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Control...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Tasks...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] Canceling task consumer...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] Closing consumer channel...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] | Consumer: Shutdown Events...
[2016-11-27 14:48:12,804: DEBUG/MainProcess] Closed channel #1
[2016-11-27 14:48:12,805: DEBUG/MainProcess] | Consumer: Shutdown Connection...
[2016-11-27 14:48:12,806: DEBUG/MainProcess] Closed channel #1
[2016-11-27 14:48:12,807: DEBUG/MainProcess] removing tasks from inqueue until task handler finished

This was introduced with Celery 4.x because downgrading to 3.1.24 prevents the traceback.

Doesn't happen to me here on Linux Python 3.4. What arguments to you use to start the worker?

_quick_put should never be None btw. Does this happen at startup or always after a connection failure?

I've been trying to reproduce by stopping the broker while executing tasks, and still no luck at reproducing.

Always at startup. The worker arguments are:

/opt/app/venv/bin/python /opt/app/venv/bin/celery worker --app=wakatime.celery --workdir=/opt/app --logfile=/var/log/celery/worker.log --loglevel=INFO --concurrency=50 --exclude-queues=medium,low,cache

👍 for this. Getting the very same issue, even on 4.0.1

@ask I get to reproduce it everytime when you have messages on the broker waiting to be processed when the worker comes up. This is often the case when using beat, which is my case. If the beat services comes online before the worker, you won't be able to start the worker due to the issue mentioned above. I'm using python 2.7 for all that matter and am able to reproduce it consistently.

This is the same error as the one mentioned on #3539

@jmesquita that's consistent with my scenario, since my queues always have pending messages on the broker when starting the workers.

@alanhamlett I'm trying to get this fixed and reading the code but I'm new to celery so it might take me sometime. What is strange to me is that with so many people using celery and celery messages being queued by default to workers, this has not exploded within the community. Makes me wonder if I'm misusing it somehow.

I dug into the code a bit, _quick_put gets assigned by AsyncPool._create_write_handlers, which gets called by AsyncPool.register_with_event_loop, which gets called by celery.worker.loops.asynloop. Superficially, the problem seems to be that asynloop first calls consumer.consume() and only then calls obj.register_with_event_loop, which causes _quick_put to be None when it gets called from within consume().

This would explain why the problems does not occur when there are no messages in the queue when the event loop starts up, as then consume() will do nothing and the next time it gets called, register_with_event_loop will have been called already.

I could fix this by moving

obj.controller.register_with_event_loop(hub)
obj.register_with_event_loop(hub)

before consumer.consume(), though this is of course only a very naive (and possibly wrong) fix.

So I worked around my problem by making celery beat messages transient, which is actually my intended behaviour anyway. I'll revisit this as soon as I have a bit more experience with Celery and it's codebase.

@jmesquita:

So I worked around my problem by making celery beat messages transient

That prevents the error from occurring. Thank you for the suggestion.

I can only reproduce this bug if I am attempting to consume from multiple queues. If everything is consumed from a single queue then start up works as expected (messages on the queue are properly consumed).

@adewes I tested your proposed solution and at least on the surface it seems to solve the problem.

@adewes Can you issue a pull request so we can discuss your proposed change?

Are there any updates on this issue? This is causing us major problems now, including in production. I can't even test locally because I get the TypeError issue. We may have to downgrade back to Celery 3.

I was not able to resolve it so far, also downgraded to version 3 for now, hope the problem will be fixed soon. @thedrow my "quick fix" did not yield a complete resolution of the problem so I'm not opening a pull request. I'm not fully versed in the dataflow of the components used (there are several libraries in play here), so I'm not able to debug this further right now unfortunately.

I'm actually not even sure we can downgrade because it's possible we might be relying on the new usage of message headers in the v2 task design.

@ask--I'm happy to screenshare or whatever with you so you can see my exact environment to help debug, maybe even try opening up a remote debug if we need to. We're in a bit of a bind because we went all in on Celery 4 and now can't start our workers in production.

For now, you can install my fork to get things running in your prod environment:

pip install -e git://github.com/alanhamlett/celery.git@73147a9da31f2932eb4778e9474fbe72f23d21c2#egg=Celery

I just opened #3752 to fix this, but need to figure out a good test to cover the bug first.

Thanks so much.

Starting to think I'm going crazy... I tried upgrading to 4.0.2 (currently on 4.0.0) and with that upgrade, all of the sudden self.retry() stopped working as well.

Specifying queues explicitly from in CLI looks like a walk around (on 4.0.0)

We do specify the queues explicitly in the CLI; we still had the issue.

On Fri, Jan 13, 2017 at 5:35 AM, Karol Duleba notifications@github.com
wrote:

Specifying queues explicitly from in CLI looks like a walk around (on
4.0.0)


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/celery/celery/issues/3620#issuecomment-272412258, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABRHQSzvyRcQI5qvfufNwPJvXNElcQKPks5rR1NggaJpZM4K7LEz
.

@jdotjdot - I am seeing the same thing. Specifying queues on the command line.
I only see the issue when there is data in the queue. If the queues are empty, I can start workers and they will run.

+1

Having this exact issue. Only solution seems to be reset my broker (which is not good for a queue).

+1

After this, #3773, lack of basic features, and overcomplexity I'll probably be replacing Celery with https://github.com/closeio/tasktiger. Does anyone want to take over #3752? All you need is write a test to cover the issue.

@alanhamlett I wasn't aware of tasktiger, but it looks promising. Do you know how mature/stable/reliable it is? I'm usually a big fan of micro-frameworks, and now Celery has grown huge, with several external dependencies + switchover to Python3...things seem going out of control (at least on my deployments).

Considering there is a fix in PR #3752 already what are the plans to release this? Please note this is a critical bug because it prevents any production environment from using Celery 4.x. You can't have workers die just because there are pending messages coming in.

Would like to see this merged as well

With the ephemeral nature of workers being so fundamental to Celery, I can’t imagine there are too many other issues that deserve a higher priority than this one. This is the only thing preventing us from going to Prod. Does anybody out there have an estimate on when it will be addressed?

@johnatron I thought the same thing, but ran into multiple other newly introduced bugs in Prod. Had to downgrade, which is difficult because the messaging spec is not compatible between 3.x and 4.x. Also made me look at alternatives to Celery, like tasktiger. Be careful with Celery 4.x.x in Prod.

The message spec is cross-compatible with the latest version of 3.x.

I will admit I'm pretty astonished this hasn't been addressed. I'm using
Alan's fork right now in production.

On Mon, Feb 13, 2017 at 2:05 PM, Alan Hamlett notifications@github.com
wrote:

@johnatron https://github.com/johnatron I thought the same thing, but
ran into multiple other newly introduced bugs in Prod. Had to downgrade,
which is difficult because the messaging spec is not compatible between 3.x
and 4.x. Also made me look at alternatives to Celery, like tasktiger
https://github.com/closeio/tasktiger. Be careful with Celery 4.x.x in
Prod.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/celery/celery/issues/3620#issuecomment-279489134, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABRHQZkHixjDu37IA7PbAW6iJYcVTGevks5rcKligaJpZM4K7LEz
.

In the light of this unfortunate delay I decided to merge some fixes for known Celery 4 issues together in git+http://github.com/Eugeny/celery#egg=celery==4.0.2.1
This includes:

  • @alanhamlett 's fix for this issue
  • @HealthTechDevelopers 's fix for django-celery-beat schedule not getting reloaded

Working fine in production so far

The message spec may be cross compatible, but there are (subtle) incompatible differences between the versions when it comes to Celery Canvas and chaining (a feature which we use a lot). A considerable amount of effort went in to port our app from v3 to v4. Having to go back would be a bad time.

O frabjous day! Thank you.

I have this same problem using Python 3.6 and celery 4.0.2.

Shut down celery, create a task, start up celery and immediately get the error TypeError: 'NoneType' object is not callable. @ask could you please consider the proposed fixes and merging? This is preventing previously happy users of Celery from continuing to use Celery :(

Thank you for your time, @ask!

"Microsoft Windows is no longer supported.

The test suite is passing, and Celery seems to be working with Windows, but we make no guarantees as we are unable to diagnose issues on this platform. If you are a company requiring support on this platform, please get in touch."
Probably not going to be fixed.

@tiptoettt this is not an issue specific to Windows. I'm on Mac. Can you please closely look at everyone's comments? Developers will ditch Celery because of this. I've used celery for 5 years and it's a major issue. I am going to use an earlier version 3.1.25 if it doesn't have this issue.

Also having this issue

lpiner@host:~$ uname -a
Linux host 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
lpiner@host:~$ celery --version
4.0.2 (latentcall)

This is the line of code that seems to cause the issue:

task_queues = (
    Queue('queue1_name', Exchange('default'), routing_key='default'),
    Queue('queue2_name', Exchange('default'), routing_key='general.routing_key'),
)

Removing the second queue causes the issue to go away.

From what I understand from @ChillarAnand 's post in celery/kombu/issues/675, this issue should be solved by 4.0.3, right?

I haven't seen this issue since I started building from the master branch.

On May 8, 2017 5:58 PM, "Victor" notifications@github.com wrote:

From what I understand from @ChillarAnand
https://github.com/ChillarAnand 's post in celery/kombu#675
https://github.com/celery/kombu/issues/675, this issue should be solved
by 4.0.3, right?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/celery/celery/issues/3620#issuecomment-300002736, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AC3sA400hzoX7V0GUrUSfYmry9SZ8eIRks5r34_pgaJpZM4K7LEz
.

Thanks, it worked!

I had the same problem on Celery 3.1.23 (Cipater) using RabbitMQ. After long debugging I finally found this thread and fixed the problem by using 3.1.25 version. But it was banging the head against the wall, really.

Same issue on v4.0.2 (latentcall) with multiple queues + heartbeat.

same issue for me on v0.4.0.2 workaround is downgrading to v3.x

v4.0.2
For me this issue is a showstopper. I do not consider downgrading to 3.x. I hope the new version will be released soon.

@ba1dr build from master. That fixes it for me.

@LPiner, thanks, this fixed the issue for me. But this is not a production-ready solution, eh?

@ba1dr not much of a choice TBH. It's either this or downgrade back to 3.x.
We use it in production with no issues but our scale is only a few hundred jobs a day, your mileage may vary.

We're gonna release soon. See #4109

Affected here as well. Just hit us. Ready for the release!

Was this page helpful?
0 / 5 - 0 ratings