Ansible: support for "serial" on an individual task

Created on 31 Aug 2015  ·  65Comments  ·  Source: ansible/ansible

(Feature Idea)

The natural way to handle configuration or other updates that require a rolling restart would be to perform updates in parallel, then notify a handler, which performs a restart with serial. But this is not possible, requiring either manual rolling restart or ugly hacks. See https://groups.google.com/forum/#!topic/ansible-project/rBcWzXjt-Xc

affects_2.1 feature

Most helpful comment

@bcoca, I think this it is necessary, you could let this open as something necessary when it'll be easier to implement.

Software features can't be stopped by technical difficulties.

All 65 comments

:+1:

+1
This would be great to have for our ansible-ceph deployment scripts!

:+1:

+1

+1

+1; however the serial needs not to fail the entire play for remaining hosts if all hosts in the current serial fail. I often have to do rolling restarts, one server at a time across 50+ servers. It sucks when the play fails on server 3 because server 3 had some strange unexpected condition that caused the restart to fail. Setting max_failpercent to something higher than 100% should force ansible to continue the play for remaining hosts.

+1

+1!

+1

+1

:+1:

+1

+1

+1

That's a good idea, but should not only go from playbooks to individual tasks but also to blocks, including all dependent options like max_fail_percentage and run_once.

The update-reboot example could explain that easily:

  • Task: Update Server (all parallel)
  • optional Task: Check if server needs restart
  • Block: (serial)

    • Restart Server

    • Wait for Server to come back

  • Additional Steps (all in parallel)

+1

+1 for this.

+1

+1, rolling restarts are useful for distributed systems that are chained all together. For Ceph, we don't want to restart all the storage daemons at the same time because the configuration file changed.

+1

+1

+1

workaround:

- name: service restart
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  with_items: "[{% for h in play_hosts  %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]"
  delegate_to: "{{ item }}"
  command: "/bin/service restart"

@alvaroaleman thanks for your suggestion however it seems to lead to this bug: https://github.com/ansible/ansible/issues/15103
At least for me, I applied your workaround like so: with_items: "[{% for h in groups[mon_group_name] %}'{{ h }}'{% if not loop.last %} ,{% endif %}{% endfor %}]".

Am I missing something?

Nifty idea. Though with Ansible 1.9, this can be simplified by doing:

with_items: '{{play_hosts}}'

@alvaroaleman thanks for the workaround.

it there any solution for serial: 30%?

Cool idea.

Assumes that a task knows which hosts are running in it OR targets all hosts from the play.

If it's a task in a role (which could have been targeted at anything) this doesn't make sense.

You could make a temporary group, but then you can't run that task twice without targeting some hosts twice.

Considering the way 'serial' and plays work, this is not really something we see adding to Ansible. You have several ways of implementing something similar with existing functionality, many others have been discussed above and elsewhere, so I'm only going to detail the 2 that cover most cases.

  • Use an intermediate play (allows for many diff serials, but requires previous tasks to end on all hosts):
- hosts: all 
  tasks:
   - anything:...
  ....

- hosts: all
  serial: 1
  tasks: 
   - singletask:

- hosts: all 
  tasks:
   - morestuff:...
  ....
  • run once + loop + delegate (limited to serial=1 behavior):
- mytask: ..
  delegate_to: "{{item}}"
  run_once: true
  # many diff ways to make the loop
  with_inventory_hostnames: all

I've been trying to play with a strategy plugin that does this ... but it would have to recreate 1/2 of Ansible core utility to accomplish it and would still have many problems when dealing with other parts of the system (like callbacks).

At this point I don't see any use case not covered by any of the workarounds above and very much doubt that anyone in core will tackle this so I'm going to close this issue.

Of course, if anyone submits code that can add this feature in a sane manner it WILL be considered for inclusion, but I doubt that this is currently possible (or I'm not smart enough to see a way).

@bcoca, I think this it is necessary, you could let this open as something necessary when it'll be easier to implement.

Software features can't be stopped by technical difficulties.

@pando85 You could probably build something close to this with a custom strategy plugin:

@detiber thanks for the info.

Indeed I solved my problem with the workaround posted here.

I have a proxy cache and when I install packages for my cluster I need to install first in one machine, and them with all packages cached I install in the others machines.

- name: Update all packages
  # serial: 1 would be the proper solution here, but that can only be set on play level
  # upstream issue: https://github.com/ansible/ansible/issues/12170
  run_once: true
  delegate_to: "{{ play_hosts[0] }}"
  yum: name=* state=latest

- yum: name=* state=latest

But I think this feature could be interested for more porpoises and seria: is the simplest and logical form to accomplish it.

@bcoca We use your 2nd suggestion in our project to accomplish serial=1 behaviour. Only negative aspect is that the summary at the end of the play gets screwed because every ok or change counts towards the first host and not the delegate. Can you think of any solution to that?

We really dont want to use the first suggestion because we have some dependendancies and we would need to include roles in both the beginning and ending play to support the use of tags.

thanks

@kami8607 you can try this, not tested with latest ansible though:

- hosts: all
  tasks:

    - name: set fact
      set_fact:
        marker: marker

    - name: group by marker
      group_by: key=marker
      changed_when: no

    - name: target task
      debug: msg="Performing task on {{ inventory_hostname }}, item is {{ item }}"
      with_items: "{{ groups['marker'] }}"
      when: "hostvars[item].inventory_hostname == inventory_hostname"

@hryamzik thank you very much! Works like a charm.
(FYI we use ansible 2.0.2.0)

@kami8607 you can also try to replace marker hack with play_hosts.

@hryamzik .Very nice, perfect solution for us now. Thank you again :)

Hi. We discovered one problem with the solution provided by @hryamzik.
If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

Maybe anyone has a solution for us. Thank you

Hi, I believe this really is a crucial feature to safely utilize role dependencies.

In our project, we model most of our plays with the help of role dependencies as it makes for short and easily readable playbooks and avoids duplication.

If I have a role A that depends on role B and it dangerous to update B on multiple hosts simultaneously then as far as I understand the only supported way to achieve this is to set serial: 1 for the play that uses role A. This can be inacceptable when having lots of hosts and lots of roles that depend on B.

We're also using @hryamzik's solution right now, but as @kami8607 said, ansible does not stop when encountering a failure.

Also @bcoca, I don't believe these kinds of workarounds should be the goal when designing a tool such as ansible. There seem to be lots of people with similar use cases that require some solution. As @pando85 said, technical difficulties should not be a reason to close this issue.
It would be really nice if this ticket could be reopened or some other solution could be considered.

A big ol' +1 from me... the way things are set up, it seems like I should be able to enable rolling restarts by adding serial: 1 to any of my restart handlers. Then, any config change, version update, etc. would result in a rolling restart.

+1

+1

+1

Roles are effectively USELESS at large scale without this.

+1 really good idea

Also in support of this. We use roles and in order to use any of the workarounds, we would have to extract just the serialized tasks into the play or a task file included from it, thus breaking role encapsulation.

+1

+1

If the task failes on one of this "pseudo-serial" hosts, the task get executed on the other hosts instead of failing immediately. No matter what we tried we could not skip the playbook directly after the failed host.

@kami8607 I've faced the same issue with failures as rolling updates and restarts require the whole playbook to fail on any error. Solved with any_errors_fatal: true.

I also confirm that this solution works with include_tasks, however check mode is executed in parallel.

- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

If you look at comments getting 👎 , it's because they're pointless, as their entire content is saying "+1".

There's also a working solution in this thread.

there is a workaround that works for most parts but no solution

There is not a working solution in this thread. While it may work for some, its not a solution.
register doesn't work properly (will have gist to back this up later) - my guess is its not the only function that will not perform correctly.

@jonhatalla I don't have any issues with register, can you share a gist or a repo that doesn't work?

I would like to limit only an artifacts download task (due to some constrains) but to execute the rest of tasks in parallel.

I have come with a proposal after reading comments which still is not working as desired for the download case. Notice that 2 is the maximum number of concurrent tasks executions desired.

    - name: Download at ratio three at most
      win_get_url:
        url: http://ipv4.download.thinkbroadband.com/100MB.zip
        dest: c:/ansible/100MB.zip
        force: yes
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.

Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.

    - debug:
        msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
      with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
      loop_control:
        pause: 2
      when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"

I discovered this issue thread thanks to this SO question

Any idea why the download doesn't seem to work as the debug message does?

At this point I don't see any use case not covered by any of the workarounds above

As previous commenters said, I also see no workaround for handlers restarting services in a cluster where you don't want to restart all nodes at the same time. So there is at least one usecase where there doesn't seem to be a solution... This render handlers in this case totally useless, as handlers are used to restart the services WHEN this is needed only.

And all other workarounds (handling concurrents write on a local hosts file for example) do work but they are so ugly...

Finally, I concurr, closing an issue because it's too big a problem to solve is a bit depressing...

@zwindler you can use tasks instead of handlers. I actually use rolling restarts with API checks. Implemented with include_task, works as expected. You can even tryinclude_task` directly in handlers but I've no idea if that works or not.

Not really sure I understand what you suggest.

  • Do you mean you use include_task to restart the services, and only do so with a when: clause to check whether or not a restart has to occur on this node ?
  • How can you assure that only one node has his services restarted at a time (that's the whole issue here) ? Do you mean you can add serial with include_task ?
- name: install and configure alive servers
  include_tasks: "install_configure.yml"
  with_items: "{{ healthy_servers }}"
  when: "hostvars[host_item].inventory_hostname == inventory_hostname"
  loop_control:
      loop_var: host_item

in this case serial=1 is simulated for all the tasks inside install_configure.yml.

how has healthy_servers been defined? how is it used in the workaround? i dont see it being referenced. I want a serial task to apply to all hosts that the playbook is being ran against.

@erpadmin why don't you use play_hosts in this case?

Hi all,
we have detected similar issue and we can not pass the argument and use in the playbook:
serial: ${serial_mode}
but if fails with:
ValueError: invalid literal for int() with base 10: 'serial_mode'

it seems to point to this bug but would like to clarify:

  • is there any workaround for this issue ??
  • what is the official version with this fix ?

thanks for your help and please keep us posted.

best regards, Pablo.

yes, also tried with this and same issue.

thanks, Pablo.

On 01/08/2018 11:02, Johannes Najjar wrote:
>

have you tried {{ serial_mode }}


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ansible_ansible_issues_12170-23issuecomment-2D409504564&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=xx2w8JlL7xtYCFCYV2SVe6ghMflP4n0oJ1XT8yRJiK4&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AoC2bq84TS9DNhwP7QHo2lN6rwu6K2fjks5uMW6dgaJpZM4F1SdA&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=S57T0QaaR3U1-rdS92VizJ7MMFzQcmoa9SvsdavdKz0&m=yK7T1nGurRdoVF74pYsp2Ww-gi_wzcik9FOhvfi0AO4&s=0mncwDiylOIi-1VOf_7Bp6ltumjR5pCnTNSjqh_SWjU&e=.

--
Oracle http://www.oracle.com
Pablo Fuentes | Oracle Middleware Consultant
Mobile: +34653961879
Oracle Oracle Consulting

Oracle Spain | ORACLE Spain las Rozas Madrid
Green Oracle http://www.oracle.com/commitment Oracle is committed to
developing practices and products that help protect the environment

It seems to me that the run once + loop + delegate (limited to serial=1 behavior): approach does not work on an include_tasks statement when the inventory has two "inventory hosts", with each host having the same value for ansible_host.

Given two inventory hosts with the same ansible_host, the approach _does run twice_; however, both iterations are against the same host.

There is a major problem with most of the proposed workarounds is CPU and memory usage, as well as massive deployment slowdowns. The method of checking that the inventory_hostname == item in a with_items loop is O(n^2), which combined with a large number of hosts can balloon memory and CPU load greatly.

With 200 hosts, I've seen ansible use 20GB of ram and 70 load avg just to serialize an include_tasks block. That particular task took several minutes just to decide which hosts to include.

+1

Anyone is invited to test #42528 for their use-cases, and add a :+1: to the PR if you approve.

Was this page helpful?
0 / 5 - 0 ratings