Celery: Feature: Beat should avoid concurrent invocations

Created on 17 Nov 2010  ·  48Comments  ·  Source: celery/celery

Requiring the user to ensure that only one instance of celerybeat exists across their cluster creates a substantial implementation burden (either creating a single point-of-failure or encouraging users to roll their own distributed mutex).

celerybeat should either provide a mechanism to prevent inadvertent concurrency, or the documentation should suggest a best-practice approach.

Celerybeat

Most helpful comment

@ankur11 single-beat ensures that only one instance of celery beat is running, but doesn't synchronise the schedule state between instances.

If I used the default scheduler with a periodic task intended to run every 15 minutes, and had a failover with single-beat 14 minutes after the last time the task ran, the task wouldn't run until 15 minutes after the the new celery beat instance started, resulting in a 29 minute gap.

To share the schedule state between instances I needed to use an alternative scheduler. django-celery-beat is the alternative mentioned in the Celery docs, but my preference was to use Redis as the backend for schedule syncing, since I was already using Redis as my Celery backend.

Redbeat includes both Redis backed shared schedule state and locking to ensure only one instance is scheduling tasks, so I didn't need single-beat or BeatCop once I started using that.

In our implementation celery beat is started by supervisord on all instances, with Redbeat as the scheduler (e.g. exec celery beat --scheduler=redbeat.RedBeatScheduler --app=myproject.celery:app). Unfortunately I can't share work-related code, but I'm happy to answer any additional questions about implementation in general.

All 48 comments

This could be solved by using kombu.pidbox, this is also how celeryd detects that there is already a node with the same name running. Since celerybeat is centralized it could use a fixed node name.

As a side effect we will be able to control celerybeat with remote control commands (there could be a command to reload the schedule for example, or see what tasks are due in the near future). That is a pretty awesome side effect if you ask me.

Needs more planning, as there is a use case for running multiple instances in the same cluster. E.g. for "sharding" the schedule in multiple pieces. There must be at least the possibility to select a node name for each instance. Postponing to 2.3.0.

We had an issue where the box running celerybeat went offline without a good fallback in place to a start a new celerybeat instance to take its place. What's the recommended HA way to run celerybeat ?

Would the kombu.pidbox approach allow us to run multiple instances of celerybeat that would just sleep if it detected that an instance was already running with the fixed node name and poll to promote itself to the active if that instance goes down?

Running multiple active instances sounds interesting - what other benefits might there be in addition to sharing the schedule?

+1

+1

This is something that is a real concern for large deploys were resilience of scheduling is important.

+9999 ;)
Is there something wrong with using kombu.pidbox solution? Even without sharding and fancy features this would be great and very handy. Right now I need to manually start celerybeat on another host.

Pidbox could be used, but the problem is that beat is not a consumer. To respond to broadcast messages like 'any beat instances here?' it would have to constantly listen for messages on its broadcast queue, and it currently can't because it's busy scheduling messages.

Technically it could be using a second thread, but that may drag performance down and is a lot of overhead just for this feature.

A second solution could be to use a lock, but with the downside of having to release it. I.e. if the beat process is killed, the stale lock would require manual intervention to start a new instance.

It could also have a 2 second timeout on the lock, and update the lock every second. Then that means a new instance would have to wait for 2 seconds if the lock is held.

A lock in amqp could be created by declaring a queue, e.g. `queue_declare('celerybeat.lock', arguments={'x-expires': 2000}``

+1

I'd love to see this

+1

+1

+1 as well

+1

Has anyone actually implemented the kombu.pidbox solution or any other mechanism which solves this problem? If so, please share it. There are a lot of people still wondering what the best practice is.

Has anyone moved away from celery altogether due to this? I'd be interested to know that as well.

EDIT:

I found this gist (https://gist.github.com/winhamwr/2719812) through a google discussion (https://www.google.co.in/search?q=celerybeat+lock&aq=f&oq=celerybeat+lock&aqs=chrome.0.57j62l3.2125j0&sourceid=chrome&ie=UTF-8).

I'm also wondering if anyone has just used an shared pidfile for celerybeat directly, maybe with an EBS on AWS or maybe in an S3 bucket… celerybeat --pidfile=/path/to/shared/volume.

I noticed that the current master (3.1 dev) has a gossip step for the consumer. Would it be possible to leverage the gossip queue and leader election to coordinate the embedded beat processes? That is each worker would run the embedded beat process but only the leader would queue the periodic task. This would likely assume a shared schedule storage.

@mlavin This could work, but only for broker transports that supports broadcast

The problem with the pidbox solution is that the celerybeat program must then be rewritten to use Async I/O.
It cannot currently both consume tasks and produce them, since the scheduler is blocking.

In most cases this isn't a needed feature at all since most production deployments have a dedicated host for the beat process, and then using a --pidfile is enough to make sure you don't start multiple instances.

I have found that often people who are affected by this problem are those who use the -B option in a daemonization
script and then duplicates that setup to another host.

So I get that it's annoying, but I don't think it's critical. If anyone truly wants a solution then they can contribute it, or hire me/donate to get it implemented.

One can use uWSGI to have single beat process with fallback to other node(s)

+1, we launch identical Amazon EC2 instances and it will be nice to have periodic tasks that execute only in one node. Meanwhile I will try to use uWSGI thanks for the suggestion.

+1

+1

I have been making a case at work to use Celerybeat for scheduling but not having HA out of the box is making it very difficult. In fact, it looks like we will be dropping it altogether because of this. Quite simply, running only 1 Celerybeat instance makes this a single point of failure and therefore not production ready for us.

@junaidch I don't think you should be dropping celery because of this. You can always simply run scheduler on every server and for periodic tasks use some kind of locking mechanism to make sure they don't overlap in any way and also don't run too often. Additionally, you can subclass scheduler and do a lock there as well or skip task-level lock and just do everything in scheduler instead.

It would be nicer to have some built-in functionality in celery as this is kind of a workaround, but still it's usable in production just fine.

Thanks @23doors.

My tasks already maintain a Redis lock to prevent another instance of the task from running. If i run 2 beats on 2 different machines and my tasks are scheduled for 5 minute intervals, i think that will work even though both beats will be pushing tasks to the queue. Making a case for adoption just gets harder when you have to implement a workaround for your core functionality.

I will investigate the sub-classing recommendation. That might be a cleaner approach.

Thanks for the suggestions!

At Lulu we solved this by writing a simple cluster singleton manager (named BeatCop). It uses an expiring Redis lock to ensure there's only one Celerybeat running in an autoscaling pool of Celery workers. If anything happens to that Celerybeat (like the instance gets scaled out or dies or Celerybeat crashes), another node automatically spawns a new Celerybeat. We've open sourced BeatCop.

@ingmar we wrote this https://github.com/ybrs/single-beat for the same reasons, last time i checked i didn't see your comment. we also released as opensource might be useful for some other. more or less does the same thing.

as far as i can see, main differences with beatcop -, we use pyuv - so beatcop is more portable i think less dependencies -, redirect the childs stderr and stdout as parents, and exit if child dies with the same code, configure it with env variables. so its a bit easier to add to supervisor.

hope it might be useful to someone else.

+1

+1

I am considering using Consul key-values, as the lock controller, has anyone tryied this approach? So while there is one instace working, the other ones would "sleep" until the lock isn' t updated, then the Consul election mechanism would decide who would be the one to update the timestamped key value. Whoever update the lock is the one working.

@ingmar Thanks for this! I'm going to give this a shot on my worker cluster.

+10 as the current implementation means a single point of failure which goes away from why we use a distributed queue in the first place

+1

+1

Looks like this is going to be in v5.0.0 https://github.com/celery/celery/milestones/v5.0.0

+1

Closing this, as with current resources it will take 10 years to complete.

Sorry, but this is a serious problem for a so-called "distributed" queue. However long this will take to implement, it should eventually be fixed. Closing a perfectly valid issue because you don't have the resources _right now_ does not seem right. Could you perhaps re-open it and apply a label that indicates that it's low priority at the moment?

I know my reason for closing was absurdly abrupt, so as a user of software I can understand your sentiment, but technically Beat is more like an add on feature. It's completely decoupled from the rest of Celery, and it was intentionally designed to be non-distributed to keep the implementation simple. It started as a neat way of defining cronjobs from Python as a bonus for users already using Celery, then more and more people used Celery as a cron replacement.

The issue has been open for SIX years now, and even though it's requested often, and countless companies are depending on it, none have ever offered to pay for an implementation of it.

It was actually one of the issues I figured would be interesting for companies to sponsor. Granted it's not like it's common for companies to offer to pay for any feature, bug fix, or even for helping to solve a production issue. I can probably count them on one hand (you are awesome), so now I know how utterly naive that idea was :)

I closed a duplicate of this issue today also, see #1495. There have been pull requests trying to solve the issue, and several are promising, but given the dedication required to prove that a given implementation works I still have not had the time to review them properly. Maybe this springs someone into action, even if not I think it's better than keeping a feature request open for six years, when nobody is working on it. That's also a sort of disservice to users who want to see this fixed.

@ask Fair enough. It is true that distributed cron is a big complicated problem, like you say in the other thread. And it does sound like something that should live outside of Celery.

Thank you for taking the time to explain your reasoning in detail.

@ask I was wondering if this problem might be circumvented by locating the celerybeat-schedule file (used by celery.beat.PersistentScheduler) inside a NFS volume that's shared across all nodes in the cluster?

The PersistentScheduler class uses shelve as a database module, so concurrent writes to the celerybeat-schedule file should be prevented by design. This is an excerpt from the shelve documentation:

The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing.

Assuming we start celery beat like this:

celery -A project-name beat -l info -s /nfs_shared_volume/celerybeat-schedule

where /nfs_shared_volume is the shared volume (e.g., managed by AWS Elastic File System), can we expect that schedules will not be messed up even if there is one celery beat process running on every node in the cluster?

@mikeschaekermann If I am reading the docs correctly shelve makes no effort to prevent concurrent write access. It just tells you not to let it happen. The section you quoted goes on to say "Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used."

@ze-phyr-us I think you're right, and I misinterpreted the shelve docs. Still, I'm wondering if the problem would be solved assuming the Scheduler backend ensures atomic operations on the schedule? @ask does the django-celery-beat package support atomicity to solve the problem? I saw that it does use transactions to do some of the updates.

For anyone else who ends up here while searching for a distributed / auto-scaling friendly celery beat, and is happy to use Redis as a backend; I tried both BeatCop and single-beat mentioned above, but ultimately picked RedBeat.

Hi @ddevlin
I am having similar issues, what problems did you face while using single-beat? Also if its not too much, could you please share the implementation sample of how you configured redbeat for multiple servers.

@ankur11 single-beat ensures that only one instance of celery beat is running, but doesn't synchronise the schedule state between instances.

If I used the default scheduler with a periodic task intended to run every 15 minutes, and had a failover with single-beat 14 minutes after the last time the task ran, the task wouldn't run until 15 minutes after the the new celery beat instance started, resulting in a 29 minute gap.

To share the schedule state between instances I needed to use an alternative scheduler. django-celery-beat is the alternative mentioned in the Celery docs, but my preference was to use Redis as the backend for schedule syncing, since I was already using Redis as my Celery backend.

Redbeat includes both Redis backed shared schedule state and locking to ensure only one instance is scheduling tasks, so I didn't need single-beat or BeatCop once I started using that.

In our implementation celery beat is started by supervisord on all instances, with Redbeat as the scheduler (e.g. exec celery beat --scheduler=redbeat.RedBeatScheduler --app=myproject.celery:app). Unfortunately I can't share work-related code, but I'm happy to answer any additional questions about implementation in general.

@mikeschaekermann you could try wrapping your celery beat with /use/bin/flock to lock the access ...

flock /nfs/lock.file celery beat ...

Assuming you trust your NFS lock implementation :)

This would ensure only one actually runs and the others block until the locker dies.

@mikeschaekermann you could try wrapping your celery beat with /use/bin/flock to lock the access ...

flock /nfs/lock.file celery beat ...

Assuming you trust your NFS lock implementation :)

This would ensure only one actually runs and the others block until the locker dies.

I tried this method. Unfortunately, if the client holding the NFS lock loses connectivity to the NFS server, the lock can be revoked by the NFS server and given to another client. When the original lock holder regains connectivity, flock doesn't realize that the lock has been revoked so now there are two nodes believing they are the 'leader.'

I ended up using an advisory lock in Postgres. I made a Django management command that uses the django_pglocks module and runs celery beat in a subprocess.

I ended up using an advisory lock in Postgres. I made a Django management command that uses the django_pglocks module and runs celery beat in a subprocess.

This seems like it could be susceptible to the same issues that I saw with using NFS. What happens if the client holding the lock loses connection with the Postgres server, or if the Postgres server is restarted?

@swt2c Argh, of course you're right! There needs to be a keep alive of some sort.

Right now I'm doing:

def _pre_exec():
    prctl.set_pdeathsig(signal.SIGTERM)

with advisory_lock(LOCK_ID) as acquired:
            assert acquired
            logging.info("Lock acquired: %s", acquired)
            p = subprocess.Popen(
                celery,
                shell=False,
                close_fds=True,
                preexec_fn=_pre_exec,
            )
            sys.exit(p.wait())

advisor_lock does support recursion, but I don't know if it is actually checking the db:

In [8]:  with advisory_lock('foo') as acquired:
   ...:     print acquired
   ...:     while True:
   ...:        with advisory_lock('foo') as acquired:
   ...:           print acquired
   ...:        time.sleep(1)
   ...:       

# Yes, it does:

True
True
True
<shutdown the instsance>
InterfaceError: cursor already closed

So ... I could modify it to keep sub-reacquiring the lock/polling and kill beat if it fails. Doesn't guarantee mutual exclusion, but it might be good enough for my purposes.

For my case concurrent beats is a wasteful annoyance, but not an integrity issue. If it were, I could also wrap the task in an advisory lock which if the db goes down, the task fails anyway.

I'm also having beat store the schedule in the DB, but haven't tested what beat does when the db goes down.

@ddevlin I was glad to see your comment since that was the solution I was thinking of implementing as well.

However, if you could share the logic of how supervisor autorestarts redbeat-1 when redbeat-2 goes down, that'd be a great help.

This might be due to my lack of understanding regarding supervisor, but it seems the autorestart=True is effective only for programs that at least get into the RUNNING state once.

My problem is:

  1. I have two program in my supervisor.conf of celery beat with redbeat.RedBeatScheduler.
  2. Starting supervisor, one beat (beat-1) gets the lock and runs, while the other (beat-2) tries to start a couple of times and enters into the FATAL state (with the Seems we're already running? error).
  3. Ideally, if beat-1 stops, then I want supervisor to start beat-2.
  4. However, that doesn't happen since it was never in a RUNNING state to begin with. Which means if I stop beat-1, then it stops and then nothing happens.

Off the top of my head, the solution would be to have a cron that keeps doing a supervisorctl restart all every 5 seconds or so, but just wanted to get your thoughts on how you were able to achieve that redundancy with supervisor.

Hi @harisibrahimkv, your issue is that you're starting two identical instances of celery beat on the same host; I expect you're seeing ERROR: Pidfile (celerybeat.pid) already exists. in your logs for beat-2? I can see that having two instances of celery beat running on the same host would be useful for testing failover between them, but for real redundancy you probably want celery beat running on multiple hosts.

To get multiple instances running on the same host, have supervisor start them with the --pidfile argument and give them separate pidfiles: e.g.

# beat-1 
celery beat --scheduler=redbeat.RedBeatScheduler --pidfile="beat-1.pid" ...
# beat-2
celery beat --scheduler=redbeat.RedBeatScheduler --pidfile="beat-2.pid" ...

Both instances should start successfully under supervisor, but if you check the log files only one of them should be scheduling tasks. If you stop that instance you should see the other instance take over task scheduling.

Our goal was to have an auto-scaling pool of identical hosts running celery workers and celery beat under supervisor. Each host has a single celery beat instance. In this configuration, celery beat should successfully start on all hosts, but any instances of celery beat that don't acquire the lock will effectively be hot standbys and not schedule tasks (although all hosts in the pool will process tasks). If the instance with the lock is stopped (for example when the pool is scaled down, or when we're doing a rolling upgrade of hosts in the pool), then one of the standby instances will acquire the lock and take over scheduling tasks.

@ddevlin Thank you so much for getting back to me and making the Internet such a wonderful place! Sincerely appreciate it! (was running around telling my entire family about your reply :D )

  1. The pidfile bit worked and I was so so happy seeing beat-2 take up the tasks when the other one stopped. Could configure the time of the beat with CELERYBEAT_MAX_LOOP_INTERVAL = 25 (on celery 3.x).

  2. Yes, for real redundancy, we plan to have this setup on different instances altogether. Thank you for explaining the setup that you were using. Going to work on that now. The "multiple host on the same instance" setup, as you rightly understood, was to just initially validate whether the concept of failover works with this supervisor setup.

Warm thanks,
From a little village at the southern most tip of the Indian subcontinent. :)

Was this page helpful?
0 / 5 - 0 ratings