Celery: Revoking/Aborting tasks on worker shutdown

Created on 24 Feb 2016  ·  3Comments  ·  Source: celery/celery

I am using 3.1.20 (Redis broker and backend) and I would like a way to Abort/Revoke the currently running tasks when the worker is being shutdown.
The point is to mark the tasks as FAILED if possible, and not rerun them next time the worker starts again.

I am running one task at a time and since the task has side-effect ( and I cannot change that ), killing the worker would be the expected user behavior when something goes wrong, and I don't want the task to be rerun next time I start the worker (default sighandler behavior I believe...)

I have tried http://stackoverflow.com/a/8230470 without success.
And I also tried a few things using the control interface or the worker from a bootstep :

from celery import Celery, bootsteps
from celery.task.control import revoke

# TODO : configuration for tests...
class BootPyrosNode(bootsteps.StartStopStep):

    def __init__(self, worker, **kwargs):
        logging.warn('{0!r} is starting from {1}'.format(worker, __file__))

        [...]

    def create(self, worker):
        return self

    def start(self, worker):
        # our step is started together with all other Worker/Consumer
        # bootsteps.
        pass  # not sure in which process this is run.

    def stop(self, worker):
        # the Consumer calls stop every time the consumer is restarted
        # (i.e. connection is lost) and also at shutdown.  The Worker
        # will call stop at shutdown only.
        logging.warn('{0!r} is stopping. Attempting abort of current tasks...'.format(worker))
        for req in worker.state.active_requests:
            # worker.app.control.revoke(req.id, terminate=True) # not working
            # revoke(req.id, terminate=True) # not working
        self.node_proc.shutdown()

installed this way :

celeros_app = Celery()

# setting up custom bootstep to start ROS node and pass ROS arguments to it
celeros_app.steps['worker'].add(BootPyrosNode)
celeros_app.user_options['worker'].add(Option('-R', '--ros-arg', action="append", help='Arguments for ros initialisation'))

However it seems that my task cannot be revoked/aborted, ( maybe due to the worker not processing control messages after stopping ? ) and I am running out of ideas.

If you want to see more, the code comes from : https://github.com/asmodehn/celeros.

Is there a way, or is this a customization that is not possible yet ?

Feature Request

Most helpful comment

Thanks !
I managed to revoke running tasks on worker shutdown. :

def stop(self, worker):
        # the Consumer calls stop every time the consumer is restarted
        # (i.e. connection is lost) and also at shutdown.  The Worker
        # will call stop at shutdown only.
        logging.warn('{0!r} is stopping. Attempting termination of current tasks...'.format(worker))

        # Following code from worker.control.revoke

        task_ids = []
        terminated = set()

        # cleaning all reserved tasks since we are shutting down
        signum = _signals.signum('TERM')
        for request in [r for r in worker.state.reserved_requests]:
            if request.id not in terminated:
                task_ids.append(request.id)
                terminated.add(request.id)
                logger.info('Terminating %s (%s)', request.id, signum)
                request.terminate(worker.pool, signal=signum)

        # Aborting currently running tasks, and triggering soft timeout exception to allow task to clean up.
        signum = _signals.signum('USR1')
        for request in [r for r in worker.state.active_requests]:
            if request.id not in terminated:
                task_ids.append(request.id)
                terminated.add(request.id)
                logger.info('Terminating %s (%s)', request.id, signum)
                request.terminate(worker.pool, signal=signum)  # triggering SoftTimeoutException in Task

        if terminated:
            terminatedstr = ', '.join(task_ids)
            logger.info('Tasks flagged as revoked: %s', terminatedstr)

        self.node_proc.shutdown()

First I revoke the task in the reserved_requests list to basically prevent any waiting task taking over just before shutdown.

Then I revoke the active request, and then I trigger the SoftTimeLimitExceeded exception in task to be able to trigger a cleanup behavior in the task. Since I am using acks_late ( to run only one task at a time), I need to return or raise from the task to properly acknowledge it and not have it restart next time I launch the worker.

I am using abortable tasks, but the abort behavior is coded for when a user want to knowingly abort a task, and the system needs to do some complex/long cleanup. This case is more like the worker is shutting down, and in that case I want to make the current task fail quickly.

I think this would be a proper way of doing things in celery ?

In a future version it would be nice to be able to simply redefine a behavior in the Task class... for example overload a on_revoke() or on_abort() method. And maybe a way to have multiple behavior on revoke, depending on some conditions...
Just my 2 cents, I haven't checked celery v4.0 yet.

All 3 comments

You cannot send remote control commands to yourself during shutdown, you need to revoke the tasks using worker internals (see how the remote control command is implemented in celery/worker/control.py).

You should probably also make sure your bootstep depends on the Pool, so that your stop() method is called first during shutdown:

class Step..:
    requires = ('celery.worker.components:Pool',)

Thanks !
I managed to revoke running tasks on worker shutdown. :

def stop(self, worker):
        # the Consumer calls stop every time the consumer is restarted
        # (i.e. connection is lost) and also at shutdown.  The Worker
        # will call stop at shutdown only.
        logging.warn('{0!r} is stopping. Attempting termination of current tasks...'.format(worker))

        # Following code from worker.control.revoke

        task_ids = []
        terminated = set()

        # cleaning all reserved tasks since we are shutting down
        signum = _signals.signum('TERM')
        for request in [r for r in worker.state.reserved_requests]:
            if request.id not in terminated:
                task_ids.append(request.id)
                terminated.add(request.id)
                logger.info('Terminating %s (%s)', request.id, signum)
                request.terminate(worker.pool, signal=signum)

        # Aborting currently running tasks, and triggering soft timeout exception to allow task to clean up.
        signum = _signals.signum('USR1')
        for request in [r for r in worker.state.active_requests]:
            if request.id not in terminated:
                task_ids.append(request.id)
                terminated.add(request.id)
                logger.info('Terminating %s (%s)', request.id, signum)
                request.terminate(worker.pool, signal=signum)  # triggering SoftTimeoutException in Task

        if terminated:
            terminatedstr = ', '.join(task_ids)
            logger.info('Tasks flagged as revoked: %s', terminatedstr)

        self.node_proc.shutdown()

First I revoke the task in the reserved_requests list to basically prevent any waiting task taking over just before shutdown.

Then I revoke the active request, and then I trigger the SoftTimeLimitExceeded exception in task to be able to trigger a cleanup behavior in the task. Since I am using acks_late ( to run only one task at a time), I need to return or raise from the task to properly acknowledge it and not have it restart next time I launch the worker.

I am using abortable tasks, but the abort behavior is coded for when a user want to knowingly abort a task, and the system needs to do some complex/long cleanup. This case is more like the worker is shutting down, and in that case I want to make the current task fail quickly.

I think this would be a proper way of doing things in celery ?

In a future version it would be nice to be able to simply redefine a behavior in the Task class... for example overload a on_revoke() or on_abort() method. And maybe a way to have multiple behavior on revoke, depending on some conditions...
Just my 2 cents, I haven't checked celery v4.0 yet.

any update on this?

Was this page helpful?
0 / 5 - 0 ratings