Celery: Same task runs multiple times at once?

Created on 20 Nov 2017 · 43Comments · Source: celery/celery

The issue is a repost of an unattended Google groups post _Same task runs multiple times?_

> ./bin/celery -A celery_app report

software -> celery:4.1.0 (latentcall) kombu:4.1.0 py:3.6.1
            billiard:3.5.0.3 redis:2.10.6
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://localhost:6379/2

broker_url: 'redis://localhost:6379/2'
result_backend: 'redis://localhost:6379/2'
task_serializer: 'json'
result_serializer: 'json'
accept_content: ['json']
timezone: 'Europe/Berlin'
enable_utc: True
imports: 'tasks'
task_routes: {
 'tasks': {'queue': 'celery-test-queue'}}

My application schedules a single group of two, sometimes three tasks, each of which with their own ETA within one hour. When the ETA arrives, I see the following in my celery log:

[2017-11-20 09:55:34,470: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 33.81780316866934s: None
[2017-11-20 09:55:34,481: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.009824380278587341s: None
[2017-11-20 09:55:34,622: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.14010038413107395s: None
…
[2017-11-20 09:55:37,890: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.012678759172558784s: None
[2017-11-20 09:55:37,891: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.01177949644625187s: None
[2017-11-20 09:55:37,899: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.008250340819358826s: None
…

This can repeat dozens of times. Note the first task’s 33 seconds execution time, and the use of different workers!

I have no explanation for this behavior, and would like to understand what’s going on here.

Redis Broker Redis Results Backend Feedback Needed ✘ Needs Testcase ✘

Source

jenstroeger

👍26

Most helpful comment

@georgepsarakis So if the task is scheduled far ahead in the future, then I might see the above? The “visibility timeout” addresses that? From the documentation you linked:

The default visibility timeout for Redis is 1 hour.

Meaning that if within the hour the worker does not ack the task (i.e. run it?) that task is being sent to another worker which wouldn’t ack, and so on… Indeed this seems to be the case looking at the caveats section of the documentation; this related issue https://github.com/celery/kombu/issues/337; or quoting from this blog:

But when developers just start using it, they regularly face abnormal behaviour of workers, specifically multiple execution of the same task by several workers. The reason which causes it is a visibility timeout setting.

Looks like setting the visibility_timeout to 31,540,000 seconds (one year) might be a quick fix.

jenstroeger on 24 Nov 2017

👍7 😄2

All 43 comments

Maybe this is related to visibility timeout?

georgepsarakis on 22 Nov 2017

👍2

@georgepsarakis Could you please elaborate on your suspicion?

jenstroeger on 23 Nov 2017

As far as I know, this is a known issue for broker transports that do not have built-in acknowledgement characteristics of AMQP. The task will be assigned to a new worker if the task completion time exceeds the visibility timeout, thus you may see tasks being executed in parallel.

georgepsarakis on 24 Nov 2017

@georgepsarakis So if the task is scheduled far ahead in the future, then I might see the above? The “visibility timeout” addresses that? From the documentation you linked:

The default visibility timeout for Redis is 1 hour.

But when developers just start using it, they regularly face abnormal behaviour of workers, specifically multiple execution of the same task by several workers. The reason which causes it is a visibility timeout setting.

Looks like setting the visibility_timeout to 31,540,000 seconds (one year) might be a quick fix.

jenstroeger on 24 Nov 2017

👍7 😄2

I would say that if you increase the visibility timeout to 2 hours, your tasks will be executed only once.

So if you combine:

Redis broker
Late acknowledgement
ETA equal/above the visibility timeout
you get multiple executions on the task.

I think what happens is:

After one hour passes one worker process starts processing the task.
A second worker will see that this message has been in the queue for longer than the visibility timeout and is being processed by another worker.
Message is restored in the queue.
Another worker starts processing the same message.
The above happens for all worker processes.

Looking into the Redis transport implementation, you will notice that it uses Sorted Sets, passing the queued time as a score to zadd. The message is restored based on that timestamp and comparing to an interval equal to the visibility timeout.

Hope this explains a bit the internals of the Redis transport.

georgepsarakis on 25 Nov 2017

👍1

@georgepsarakis, I’m now thoroughly confused. If a task’s ETA is set for two months from now, why would a worker pick it up one hour after the tasks has been scheduled? Am I missing something?

My (incorrect?) assumption is that:

I schedule a task with an ETA at any time in the future; then
the task (i.e. its marshaled arguments) and ETA will sit in the queue until ETA arrives; then
a worker begins processing the task at ETA.

Your “_I think what happens is:_” above is quite different from my assumption.

jenstroeger on 27 Nov 2017

I also encountered the same problem，have you solved it? @jenstroeger

Thanks!

zivsu on 17 Dec 2017

@jenstroeger that does not sound like a feasible flow, I think the worker just continuously requeues the message in order to postpone execution until the ETA condition is finally met. The concept of the queue is to distribute messages as soon as they arrive, so the worker examines the message and just requeues.

Please note that this is my guess, I am not really aware of the internals of the ETA implementation.

georgepsarakis on 17 Dec 2017

@zivsu, as mentioned above I’ve set the visibility_timeout to a _very_ large number and that seems to have resolved the symptoms. However, as @georgepsarakis points out, that seems to be a poor approach.

I do not know the cause of the original problem nor how to address it properly.

jenstroeger on 21 Dec 2017

@jenstroeger I read some blog, change visibility_timeout can not solve the problem completely, so I change my borker to rabbitmq.

zivsu on 21 Dec 2017

@zivsu, can you please share the link to the blog? Did you use Redis before?

jenstroeger on 22 Dec 2017

@jenstroeger I can't find the blog, I used Redis as broker before. For schedule task, I choose rebbitmq to avoid the error happen again.

zivsu on 22 Dec 2017

👍2

I have exactly same issue, my config is:
django==1.11.6
celery==4.2rc2
django-celery-beat==1.0.1

settings:

CELERY_ENABLE_UTC = True
# CELERY_TIMEZONE = 'America/Los_Angeles'

And that is the only one working combination of this settings. Also I have to schedule my periodic tasks in UTC timezone.
If you enable CELERY_TIMEZONE or disable CELERY_ENABLE_UTC it starts running periodic tasks multiple times.

Anton-Shutik on 10 Apr 2018

I have the save problem. the eta task excute multiply times when using redis as a broker.
any way to solve this..
look like change broker from redis to rabbitmq solve this problem..

ghost on 27 Apr 2018

😕4 👍4

Using redis, there is a well-known issue when you specify a timezone other than UTC. To work around the issue, subclass the default app, and add your own timezone handling function:

from celery import Celery


class MyAppCelery(Celery):
    def now(self):
        """Return the current time and date as a datetime."""
        from datetime import datetime
        return datetime.now(self.timezone)

Hope that helps anyone else that is running into this problem.

2ps on 31 May 2018

👍3 👀1

I get this problem sometimes when frequently restarting celery jobs with beat on multicore machines. I've gotten in the habit of running ps aux | grep celery then kill <each_pid> to resolve it.

Best advice I have is to always make sure you see the "restart DONE" message before disconnecting from the machine.

chrisconlan on 26 Sep 2018

{"log":"INFO 2018-10-09 17:41:08,468 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T17:41:08.468912644Z"}
{"log":"INFO 2018-10-09 17:41:08,468 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T17:41:08.468955918Z"}
{"log":"INFO 2018-10-09 19:46:04,293 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T19:46:04.293780045Z"}
{"log":"INFO 2018-10-09 19:46:04,293 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T19:46:04.293953621Z"}
{"log":"INFO 2018-10-09 20:46:04,802 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T20:46:04.802819711Z"}
{"log":"INFO 2018-10-09 20:46:04,802 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T20:46:04.802974829Z"}
{"log":"INFO 2018-10-09 21:46:05,335 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T21:46:05.336081133Z"}
{"log":"INFO 2018-10-09 21:46:05,335 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T21:46:05.336107517Z"}
{"log":"INFO 2018-10-09 22:46:05,900 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T22:46:05.901078395Z"}
{"log":"INFO 2018-10-09 22:46:05,900 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T22:46:05.901173663Z"}
{"log":"INFO 2018-10-09 23:46:06,484 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T23:46:06.485276904Z"}
{"log":"INFO 2018-10-09 23:46:06,484 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-09T23:46:06.485415253Z"}
{"log":"INFO 2018-10-10 00:46:07,072 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T00:46:07.072529828Z"}
{"log":"INFO 2018-10-10 00:46:07,072 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T00:46:07.072587887Z"}
{"log":"INFO 2018-10-10 01:46:07,602 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T01:46:07.60325321Z"}
{"log":"INFO 2018-10-10 01:46:07,602 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T01:46:07.603327426Z"}
{"log":"INFO 2018-10-10 02:46:08,155 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T02:46:08.155868992Z"}
{"log":"INFO 2018-10-10 02:46:08,155 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T02:46:08.155921893Z"}
{"log":"INFO 2018-10-10 03:46:08,753 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T03:46:08.75401387Z"}
{"log":"INFO 2018-10-10 03:46:08,753 strategy celery.worker.strategy 1 140031597243208 Received task: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f]  ETA:[2018-10-10 04:00:00+00:00] \n","stream":"stderr","time":"2018-10-10T03:46:08.754056891Z"}
{"log":"DEBUG 2018-10-10 04:00:00,013 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:70\n","stream":"stderr","time":"2018-10-10T04:00:00.013548928Z"}
{"log":"DEBUG 2018-10-10 04:00:00,013 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:70\n","stream":"stderr","time":"2018-10-10T04:00:00.013592318Z"}
{"log":"DEBUG 2018-10-10 04:00:00,013 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:71\n","stream":"stderr","time":"2018-10-10T04:00:00.014000106Z"}
{"log":"DEBUG 2018-10-10 04:00:00,013 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:71\n","stream":"stderr","time":"2018-10-10T04:00:00.014167558Z"}
{"log":"DEBUG 2018-10-10 04:00:00,014 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:64\n","stream":"stderr","time":"2018-10-10T04:00:00.014661348Z"}
{"log":"DEBUG 2018-10-10 04:00:00,014 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:64\n","stream":"stderr","time":"2018-10-10T04:00:00.014684354Z"}
{"log":"DEBUG 2018-10-10 04:00:00,014 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:65\n","stream":"stderr","time":"2018-10-10T04:00:00.01514884Z"}
{"log":"DEBUG 2018-10-10 04:00:00,014 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:65\n","stream":"stderr","time":"2018-10-10T04:00:00.015249646Z"}
{"log":"DEBUG 2018-10-10 04:00:00,015 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:66\n","stream":"stderr","time":"2018-10-10T04:00:00.01571124Z"}
{"log":"DEBUG 2018-10-10 04:00:00,015 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:66\n","stream":"stderr","time":"2018-10-10T04:00:00.01580249Z"}
{"log":"DEBUG 2018-10-10 04:00:00,019 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:68\n","stream":"stderr","time":"2018-10-10T04:00:00.019260948Z"}
{"log":"DEBUG 2018-10-10 04:00:00,019 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:68\n","stream":"stderr","time":"2018-10-10T04:00:00.019322151Z"}
{"log":"DEBUG 2018-10-10 04:00:00,245 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:70\n","stream":"stderr","time":"2018-10-10T04:00:00.245159563Z"}
{"log":"DEBUG 2018-10-10 04:00:00,245 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:70\n","stream":"stderr","time":"2018-10-10T04:00:00.245177267Z"}
{"log":"DEBUG 2018-10-10 04:00:00,245 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:67\n","stream":"stderr","time":"2018-10-10T04:00:00.245338722Z"}
{"log":"DEBUG 2018-10-10 04:00:00,245 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:67\n","stream":"stderr","time":"2018-10-10T04:00:00.245351289Z"}
{"log":"DEBUG 2018-10-10 04:00:00,256 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:65\n","stream":"stderr","time":"2018-10-10T04:00:00.256770035Z"}
{"log":"DEBUG 2018-10-10 04:00:00,256 request celery.worker.request 1 140031597243208 Task accepted: main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] pid:65\n","stream":"stderr","time":"2018-10-10T04:00:00.256788689Z"}
{"log":"INFO 2018-10-10 04:00:00,371 trace celery.app.trace 68 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.35710329699213617s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.371967002Z"}
{"log":"INFO 2018-10-10 04:00:00,371 trace celery.app.trace 68 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.35710329699213617s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.371983293Z"}
{"log":"INFO 2018-10-10 04:00:00,387 trace celery.app.trace 69 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.10637873200175818s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.388119538Z"}
{"log":"INFO 2018-10-10 04:00:00,387 trace celery.app.trace 69 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.10637873200175818s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.388166317Z"}
{"log":"INFO 2018-10-10 04:00:00,404 trace celery.app.trace 70 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.16254851799749304s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.404834545Z"}
{"log":"INFO 2018-10-10 04:00:00,404 trace celery.app.trace 70 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.16254851799749304s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.404862208Z"}
{"log":"INFO 2018-10-10 04:00:00,421 trace celery.app.trace 65 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.1654666289978195s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.421607856Z"}
{"log":"INFO 2018-10-10 04:00:00,421 trace celery.app.trace 65 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.1654666289978195s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.421674687Z"}
{"log":"INFO 2018-10-10 04:00:00,438 trace celery.app.trace 67 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.19588526099687442s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.438295459Z"}
{"log":"INFO 2018-10-10 04:00:00,438 trace celery.app.trace 67 140031597243208 Task main.batch.sendspam[2a6e5dc8-5fd2-40bd-8f65-7e7334a14b3f] succeeded in 0.19588526099687442s: None\n","stream":"stderr","time":"2018-10-10T04:00:00.438311386Z"}
...

if we check Received task timestamps, every hour it will get new task with same id. The result is that all ETA messages are sent more than 10 times. Looks like rabbitmq is only option if we want to use ETA

zetaab on 10 Oct 2018

Rcently meet similar bug. Also ps aux | grep celery showed more processes than workers started, twice more. Appending parameter --pool gevent to command launching celery workers lowered number of processes to exact number of started workers and celery beat. And now i'm wathnig my tasks execution.

Dedal-O on 2 Jan 2019

Might another solution be disabling ack emulation entirely? i.e. "broker_transport_options": {"ack_emulation": False}. Any drawbacks for short-running tasks / countdowns?

killthekitten on 25 May 2019

Did anyone get the fix?

auvipy on 26 Jun 2019

👀11

Have been facing the same problem, Any solution for it?

arbazkiraak on 20 Sep 2019

😕2

Bumping, the same problem.

dghy on 14 Oct 2019

Same issue here, using Redis as broker.

$ pipenv graph --bare | grep -i "redis\|celery"                                                                                                                                                                                                                              
---
channels-redis==2.4.0
  - aioredis [required: ~=1.0, installed: 1.3.0]
    - hiredis [required: Any, installed: 1.0.0]
django-celery-beat==1.5.0
django-celery-results==1.1.2
  - celery [required: >=4.3,<5.0, installed: 4.3.0]
  - celery [required: >=3.1.0, installed: 4.3.0]
  - pylint-celery [required: ==0.3, installed: 0.3]
redis==3.2.1

xolan on 14 Oct 2019

Same problem here. Celery version 4.3.0
celery = Celery('tasks', broker='pyamqp://nosd:sdsd@rabbit//', config_from_object={"broker_transport_options":{'visibility_timeout': 18000}})

The command I use to run my worker
celery -A tasks worker --pidfile=/tmp/celery_worker.pid -f=/tmp/celery_worker.log -Q celery_queue --loglevel=info --pool=solo --concurrency=1 -n worker_celery --detach --without-gossip --without-mingle --without-heartbeat

alkattan on 13 Nov 2019

can you give celery==4.4.0rc4 a try?

auvipy on 13 Nov 2019

Celery is receiving same task twice, with same task id at same time.
Here are the logs

[2019-11-29 08:07:35,464: INFO/MainProcess] Received task: app.jobs.booking.bookFlightTask[657985d5-c3a3-438d-a524-dbb129529443]  
[2019-11-29 08:07:35,465: INFO/MainProcess] Received task: app.jobs.booking.bookFlightTask[657985d5-c3a3-438d-a524-dbb129529443]  
[2019-11-29 08:07:35,471: WARNING/ForkPoolWorker-4] in booking funtion1
[2019-11-29 08:07:35,473: WARNING/ForkPoolWorker-3] in booking funtion1
[2019-11-29 08:07:35,537: WARNING/ForkPoolWorker-3] book_request_pp
[2019-11-29 08:07:35,543: WARNING/ForkPoolWorker-4] book_request_pp

both are are running simultaneously,

using celery==4.4.0rc4 , boto3==1.9.232, kombu==4.6.6 with SQS in pyhton flask.
in SQS, Default Visibility Timeout is 30 minutes, and my task is not having ETA and not ack.

running worker like,
celery worker -A app.jobs.run -l info --pidfile=/var/run/celery/celery.pid --logfile=/var/log/celery/celery.log --time-limit=7200 --concurrency=8

@auvipy , any help would be great.

nitish-itilite on 1 Dec 2019

which broker and result back end are you using? can you try switching to another back end?

auvipy on 1 Dec 2019

which broker and result back end are you using? can you try switching to another back end?

using SQS, result backend is MYSQL, with sqlalchemy.
details are here at SO, https://stackoverflow.com/questions/59123536/celery-is-receiving-same-task-twice-with-same-task-id-at-same-time
@auvipy can you please have a look.

nitish-itilite on 1 Dec 2019

@thedrow do you face this issue in bloomberg?

auvipy on 1 Dec 2019

@nitish-itilite : what timezone are you using for celery?

2ps on 2 Dec 2019

@nitish-itilite : what timezone are you using for celery?

it is default UTC. in sqs , region is this US East (N. Virginia).

nitish-itilite on 2 Dec 2019

I had a similar case running celery with SQS. I ran a dummy task with countdown=60, while visibility timeout in SQS is 30 seconds. Here's what I get:

NOTE: I've started celery with --concurrency=1, so there are two threads, right?

[2020-02-18 14:46:32 +0000] [INFO] Received task: notification[b483a22f-31cc-4335-9709-86041baa8f05]  ETA:[2020-02-18 14:47:31.898563+00:00] 
[2020-02-18 14:47:02 +0000] [INFO] Received task: notification[b483a22f-31cc-4335-9709-86041baa8f05]  ETA:[2020-02-18 14:47:31.898563+00:00] 
[2020-02-18 14:47:32 +0000] [INFO] Task notification[b483a22f-31cc-4335-9709-86041baa8f05] succeeded in 0.012232275999849662s: None
[2020-02-18 14:47:32 +0000] [INFO] Task notification[b483a22f-31cc-4335-9709-86041baa8f05] succeeded in 0.012890915997559205s: None

What happened in chronological order:

14:46:32 task was received, and SQS put it in inflight mode for 30 seconds
14:47:02 same task was received, because the visibility timeout expired
14:47:32 both tasks are executed at the same time

My guess is that this is a bug in Celery (?), I think it should've checked if the message id (b483a22f-31cc-4335-9709-86041baa8f05) has already been taken by that worker.

Maybe there could be a hash list with all message ids, so that celery could decide if a received task is valid for processing. Can celery do that?

NOTE 2:
We can't set a visibility timeout for too long, because if the worker does actually die, the message would take too long to be picked up by another worker. Setting it too low would expose this condition.

chaws on 18 Feb 2020

This seems to be happening to me too.

[2020-05-11 15:31:23,673: INFO/MainProcess] Received task: ee_external_attributes.tasks. recreate_specific_values[53046bd7-2a19-4f72-808f-d712eaecb0e8]
[2020-05-11 15:31:28,673: INFO/MainProcess] Received task: ee_external_attributes.tasks.recreate_specific_values[53046bd7-2a19-4f72-808f-d712eaecb0e8]

(I tweaked the task name in the logs for public posting.)

Due to uniqueness constraints, one of my workers throws an error partway through the task, and the other one succeeds.

I tried setting
CELERY_WORKER_PREFETCH_MULTIPLIER = 1
That turned out not to help.

I'm using
celery==4.4.1
django-celery-results==1.2.1

And I'm using AWS SQS for the queue.

JulieGoldberg on 12 May 2020

I do have a theory. Apparently my "Default Visibility Timeout" setting on my queue was only set to 5 seconds. It may be that the second worker pulled the job while the first was working on it, because it assumed the first worker had died. I upped the visibility timeout to 2 minutes, and it seems to be doing better. I had plenty of tasks that took 8-12 seconds, so 2 minutes may be overkill. But hopefully that solves it.

JulieGoldberg on 12 May 2020

It may be that the second worker pulled the job while the first was working on it, because it assumed the first worker had died.

@JulieGoldberg, that would be a crummy way for Celery to handle jobs. A Celery worker should never start a job that another worker has pulled off the queue and is actively processing; think that would be seriously broken. (But it’s Celery, I’m surprised by nothing anymore 😒)

jenstroeger on 12 May 2020

👎5 😕1

I have a similar problem with an application that is running in Kubernetes. In the Kubernetes instance, we have 10 workers (celery app instance) who consume the tasks from the Redis.

Symptoms:
The celery worker schedules an ETA task twhich will be planed after 30 minutes. If the Kubernetes pod is rotated (the worker is killed by Kubernetes) or a newer version of the application is deployed (all workers are killed and new workers are created), all workers will take the scheduled task and start executing in the defined time.
For the worker, I tried to set different values of visibility_timeout for several hours up to one year, but the result was still the same. The same behavior was reached with the setting enable_utc = True, or a reduction ofworker_prefetch_multiplier = 1.

elMateso on 17 Jul 2020

I don't know if this will help anyone but this was my issue:

I had tasks (report generation) that were being run when a page was loaded via GET. For some reason (something to do with favicons) Chrome would send 2 GET requests on every page load, triggering the task twice.
GET requests are supposed to be side effect free, so I turned them all into forms that you submit and the issue was resolved.

adnathanail on 18 Jul 2020

👍1

It may be that the second worker pulled the job while the first was working on it, because it assumed the first worker had died.

@JulieGoldberg, that would be a crummy way for Celery to handle jobs. A Celery worker should never start a job that another worker has pulled off the queue and is actively processing; think that would be seriously broken. (But it’s Celery, I’m surprised by nothing anymore 😒)

Instead of complaining, you can help us fix the issue by coming up with a solution and a PR.

thedrow on 27 Jul 2020

👍2

I have a similar problem with an application that is running in Kubernetes. In the Kubernetes instance, we have 10 workers (celery app instance) who consume the tasks from the Redis.

Symptoms:
The celery worker schedules an ETA task twhich will be planed after 30 minutes. If the Kubernetes pod is rotated (the worker is killed by Kubernetes) or a newer version of the application is deployed (all workers are killed and new workers are created), all workers will take the scheduled task and start executing in the defined time.

@elMateso I faced similar issues with Airflow deployment on k8s (consumers on pods and redis as a queue). But I was able to make the deployment stable and working as expected, maybe those tips will help you:
https://www.polidea.com/blog/application-scalability-kubernetes/#tips-for-hpa

turbaszek on 27 Jul 2020

👍1

Facing the same here.

Doesn't seems to be a problem with any timing configuration (visibility timeout, ETA, etc..), for me at least. In mine case it happens microseconds between executions. Didn't find how celery does in fact ACK a message, but, if, in rabbitMQ it's working perfectly it seems to be a problem with concurrency and ACK in Redis.

jgbmattos on 29 Jul 2020

👍4

I am seeing the same issue and we are using Redis as broker too. Changing to rabbitMQ is not an option for us.

Does anyone have tried using a lock to ensure the task is only executed once only. Could that work?

e.g. https://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html#ensuring-a-task-is-only-executed-one-at-a-time

ErikKalkoken on 19 Aug 2020

@ErikKalkoken we end up doing exactly that.

def semaphore(fn):
    @wraps(fn)
    def wrapper(self_origin, *args, **kwargs):
        cache_name = f"{UID}-{args[0].request.id}-semaphore"
        agreement_redis = AgreementsRedis()
        if not agreement_redis.redis.set(cache_name, "", ex=30, nx=True):
          Raise Exception("...")
        try:
            return fn(self_origin, *args, **kwargs)
        finally:
            agreement_redis.redis.delete(cache_name)

    return wrapper

The code above is not used for celery, but celery multiple execution is the same logic, you just need to get the task_id and set the cache. So far is working fine.

jgbmattos on 19 Aug 2020

👍4

can someone check this pr https://github.com/vinayinvicible/kombu/commit/a755ba14def558f2983b3ff3358086ba55521dcc

auvipy on 14 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings