Ansible: support for "serial" on an individual task

Created on 31 Aug 2015 · 65Comments · Source: ansible/ansible

(Feature Idea)

The natural way to handle configuration or other updates that require a rolling restart would be to perform updates in parallel, then notify a handler, which performs a restart with serial. But this is not possible, requiring either manual rolling restart or ugly hacks. See https://groups.google.com/forum/#!topic/ansible-project/rBcWzXjt-Xc

affects_2.1 feature

Source

djudd

👍69

Most helpful comment

@bcoca, I think this it is necessary, you could let this open as something necessary when it'll be easier to implement.

Software features can't be stopped by technical difficulties.

pando85 on 2 Mar 2017

👍23

All 65 comments

:+1:

srgvg on 31 Aug 2015

+1
This would be great to have for our ansible-ceph deployment scripts!

neutrinus on 11 Sep 2015

:+1:

ccciudatu on 12 Sep 2015

Looks like there's a bounty for this here: https://www.bountysource.com/issues/26342862-support-for-serial-on-an-individual-task

JensRantil on 1 Oct 2015

pratikdhandharia on 7 Oct 2015

👎2

jgrmnprz on 13 Oct 2015

👎1

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

ehorne on 15 Oct 2015

👍1

herdani on 20 Oct 2015

👎2

+1!

yeroc on 22 Jan 2016

👎2

AliakseiKorneu on 3 Mar 2016

👎2

StephaneBunel on 18 Mar 2016

👎1

:+1:

minhdanh on 14 Apr 2016

👎1

folex on 25 Apr 2016

👎1

raittes on 8 Jun 2016

👎1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

Task: Update Server (all parallel)
optional Task: Check if server needs restart
Block: (serial)
- Restart Server
- Wait for Server to come back
Additional Steps (all in parallel)

loechel on 12 Jul 2016

👍3

stefiienko on 19 Jul 2016

+1 for this.

jeffrey4l on 25 Jul 2016

👍1

bozzo on 27 Jul 2016

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

leseb on 29 Jul 2016

👍1

AncientLeGrey on 1 Aug 2016

👎1 👍1

akulakhan on 5 Aug 2016

tklicki on 6 Sep 2016

👎2 👍2

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"

alvaroaleman on 22 Sep 2016

🎉3 😄1

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: https://github.com/ansible/ansible/issues/15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

leseb on 22 Sep 2016

Nifty idea. Though with Ansible 1.9, this can be simplified by doing:

with_items: '{{play_hosts}}'

phemmer on 23 Sep 2016

@alvaroaleman thanks for the workaround.

it there any solution for serial: 30%?

jeffrey4l on 23 Sep 2016

Cool idea.

Assumes that a task knows which hosts are running in it OR targets all hosts from the play.

If it's a task in a role (which could have been targeted at anything) this doesn't make sense.

You could make a temporary group, but then you can't run that task twice without targeting some hosts twice.

jamessewell on 4 Oct 2016

Considering the way 'serial' and plays work, this is not really something we see adding to Ansible. You have several ways of implementing something similar with existing functionality, many others have been discussed above and elsewhere, so I'm only going to detail the 2 that cover most cases.

Use an intermediate play (allows for many diff serials, but requires previous tasks to end on all hosts):

- hosts: all 
  tasks:
   - anything:...
  ....

- hosts: all
  serial: 1
  tasks: 
   - singletask:

- hosts: all 
  tasks:
   - morestuff:...
  ....

run once + loop + delegate (limited to serial=1 behavior):

- mytask: ..
  delegate_to: "{{item}}"
  run_once: true
  # many diff ways to make the loop
  with_inventory_hostnames: all

I've been trying to play with a strategy plugin that does this ... but it would have to recreate 1/2 of Ansible core utility to accomplish it and would still have many problems when dealing with other parts of the system (like callbacks).

At this point I don't see any use case not covered by any of the workarounds above and very much doubt that anyone in core will tackle this so I'm going to close this issue.

Of course, if anyone submits code that can add this feature in a sane manner it WILL be considered for inclusion, but I doubt that this is currently possible (or I'm not smart enough to see a way).

bcoca on 4 Nov 2016

👍4

@bcoca, I think this it is necessary, you could let this open as something necessary when it'll be easier to implement.

Software features can't be stopped by technical difficulties.

pando85 on 2 Mar 2017

👍23

@pando85 You could probably build something close to this with a custom strategy plugin:

detiber on 2 Mar 2017

@detiber thanks for the info.

Indeed I solved my problem with the workaround posted here.

I have a proxy cache and when I install packages for my cluster I need to install first in one machine, and them with all packages cached I install in the others machines.

- name: Update all packages
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  delegate_to: "{{ play_hosts[0] }}"
  yum: name=* state=latest

- yum: name=* state=latest

But I think this feature could be interested for more porpoises and seria: is the simplest and logical form to accomplish it.

pando85 on 2 Mar 2017

@bcoca We use your 2nd suggestion in our project to accomplish serial=1 behaviour. Only negative aspect is that the summary at the end of the play gets screwed because every ok or change counts towards the first host and not the delegate. Can you think of any solution to that?

We really dont want to use the first suggestion because we have some dependendancies and we would need to include roles in both the beginning and ending play to support the use of tags.

thanks

christiang830 on 3 Mar 2017

@kami8607 you can try this, not tested with latest ansible though:

- hosts: all
  tasks:

    - name: set fact
      set_fact:
        marker: marker

    - name: group by marker
      group_by: key=marker
      changed_when: no

    - name: target task
      debug: msg="Performing task on {{ inventory_hostname }}, item is {{ item }}"
      with_items: "{{ groups['marker'] }}"
      when: "hostvars[item].inventory_hostname == inventory_hostname"

hryamzik on 3 Mar 2017

👍7

@hryamzik thank you very much! Works like a charm.
(FYI we use ansible 2.0.2.0)

christiang830 on 3 Mar 2017

@kami8607 you can also try to replace marker hack with play_hosts.

hryamzik on 3 Mar 2017

👍2

@hryamzik .Very nice, perfect solution for us now. Thank you again :)

christiang830 on 3 Mar 2017

Hi. We discovered one problem with the solution provided by @hryamzik.
If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

Maybe anyone has a solution for us. Thank you

christiang830 on 23 May 2017

😕1

Hi, I believe this really is a crucial feature to safely utilize role dependencies.

In our project, we model most of our plays with the help of role dependencies as it makes for short and easily readable playbooks and avoids duplication.

If I have a role A that depends on role B and it dangerous to update B on multiple hosts simultaneously then as far as I understand the only supported way to achieve this is to set serial: 1 for the play that uses role A. This can be inacceptable when having lots of hosts and lots of roles that depend on B.

We're also using @hryamzik's solution right now, but as @kami8607 said, ansible does not stop when encountering a failure.

Also @bcoca, I don't believe these kinds of workarounds should be the goal when designing a tool such as ansible. There seem to be lots of people with similar use cases that require some solution. As @pando85 said, technical difficulties should not be a reason to close this issue.
It would be really nice if this ticket could be reopened or some other solution could be considered.

kluen on 20 Sep 2017

👍13

A big ol' +1 from me... the way things are set up, it seems like I should be able to enable rolling restarts by adding serial: 1 to any of my restart handlers. Then, any config change, version update, etc. would result in a rolling restart.

sanear on 17 Oct 2017

👍7

dglinder on 25 Oct 2017

👎6 👍2

george-miller on 15 Nov 2017

👎7

Roles are effectively USELESS at large scale without this.

dpedu2 on 22 Nov 2017

👎5

+1 really good idea

sangrealest on 1 Dec 2017

👎8

Also in support of this. We use roles and in order to use any of the workarounds, we would have to extract just the serialized tasks into the play or a task file included from it, thus breaking role encapsulation.

JoelFeiner on 6 Dec 2017

shellshock1953 on 8 Dec 2017

👎8 👍1

joshovi on 21 Dec 2017

👎8

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

hryamzik on 17 Jan 2018

👍1

If you look at comments getting 👎 , it's because they're pointless, as their entire content is saying "+1".

cubranic on 29 Jan 2018

👍1

There's also a working solution in this thread.

hryamzik on 29 Jan 2018

there is a workaround that works for most parts but no solution

christiang830 on 29 Jan 2018

👍1

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

jonhatalla on 2 Feb 2018

👍1

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

hryamzik on 5 Feb 2018

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

guillemsola on 7 Feb 2018

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

zwindler on 6 Mar 2018

👍5

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even tryinclude_task` directly in handlers but I've no idea if that works or not.

hryamzik on 12 Mar 2018

Not really sure I understand what you suggest.

Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?

zwindler on 12 Mar 2018

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

hryamzik on 12 Mar 2018

👍7

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

erpadmin on 18 Jun 2018

👍1

@erpadmin why don't you use play_hosts in this case?

hryamzik on 18 Jun 2018

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

is there any workaround for this issue ??
what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

pablofuentesbodion on 1 Aug 2018

yes, also tried with this and same issue.

thanks, Pablo.

On 01/08/2018 11:02, Johannes Najjar wrote:
>

have you tried {{ serial_mode }}

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ansible_ansible_issues_12170-23issuecomment-2D409504564&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=xx2w8JlL7xtYCFCYV2SVe6ghMflP4n0oJ1XT8yRJiK4&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AoC2bq84TS9DNhwP7QHo2lN6rwu6K2fjks5uMW6dgaJpZM4F1SdA&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=0mncwDiylOIi-1VOf_7Bp6ltumjR5pCnTNSjqh_SWjU&e=.

--
Oracle http://www.oracle.com
Pablo Fuentes | Oracle Middleware Consultant
Mobile: +34653961879
Oracle Oracle Consulting

Oracle Spain | ORACLE Spain las Rozas Madrid
Green Oracle http://www.oracle.com/commitment Oracle is committed to
developing practices and products that help protect the environment

pablofuentesbodion on 1 Aug 2018

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach _does run twice_; however, both iterations are against the same host.

crossan007 on 6 Sep 2018

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.