Ansible: Reboot and Wait for

Created on 10 Feb 2016  ·  58Comments  ·  Source: ansible/ansible

ISSUE TYPE
  • Bug Report
COMPONENT NAME

wait_for

ANSIBLE VERSION

v2.2

SUMMARY

Hi,

I have the below as a part of my playbok to upgrade all system packages, reboot the machine and wait for it to come back. The ansible playbook exits when machine reboots and is not waiting for the host to come back online and run the remaining playbook. Can you please suggest?

  - name: reboot the system when package is upgraded
    command: /sbin/shutdown -r now "Ansible system package upgraded"
    when: latest_state.changed
    tags: upgrade_packages_all

  - name: waiting for server to come back
    local_action: wait_for host={{ ansible_default_ipv4.address }} port=22 state=started delay=30 timeout=60
    sudo: false
    tags: upgrade_packages_all
TASK [vmsetup : reboot the system when package is upgraded] ********************
fatal: [96.119.246.13]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "PolicyKit daemon disconnected from the bus.\r\nWe are no longer a registered authentication agent.\r\n", "msg": "MODULE FAILURE", "parsed": false}

Reboot works but unusable playbook lost it connection as shown with above error.

#tail -f /var/log/messages
Feb 10 16:25:30  nrpe[872]: Daemon shutdown
Connection to xx.xxx.xxx.xx closed by remote host.
Connection to xx.xxx.xxx.xx closed.

Let me know if any details required. Thanks.

Thanks,
Govind

affects_2.2 affects_2.3 bug module core

Most helpful comment

An update of the docs and/or the support article to use the preferred full YAML format for tasks would also be nice. This works for me:

    - name: reboot nodes
      shell: sleep 2 && shutdown -r now "Ansible reboot"
      async: 1
      poll: 0
      ignore_errors: true

    - name: wait for server to come back
      local_action: wait_for
      args:
        host: "{{ inventory_hostname }}"
        port: 22
        state: started
        delay: 30
        timeout: 300

All 58 comments

add && sleep 1

shell: /sbin/shutdown -r now "Ansible system package upgraded" && sleep 1

as a workaround to avoid the connection shutting before Ansible can 'reap' the temp files and close the connection.

Just as info, this is documented at https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back although not maintained in this repos docs, and does not show up at docs.ansible.com

@bcoca Added as you said but still ran into same error. I have to use ignore_errors: true to skip that error.

  • name: reboot the system when package is upgraded
    command: /sbin/shutdown -r now "Ansible system package upgraded" && sleep 1
    when: latest_state.changed
    ignore_errors: true
    tags: upgrade_packages_all

Error:
fatal: [96.119.246.13]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "PolicyKit daemon disconnected from the bus.r\nWe are no longer a registered authentication agent.r\n", "msg": "MODULE FAILURE", "parsed": false}

I had same problem with 2.0.0.2, this workaround helped me:

- name: Wait for server come back
  wait_for: >
    host="{{ inventory_hostname }}"
    port=22
    delay=15
    timeout=60
  delegate_to: localhost

@gvenka008c:

You may want to try:

shell: sleep 2 && /sbin/shutdown -r now

@andyhky it worked!! :) thanks! Final solution in Ansible 2.1 that works is as follows

- name: Restart server
  become: yes
  shell: sleep 2 && /sbin/shutdown -r now "Ansible system package upgraded"


- name: waiting 30 secs for server to come back
  local_action: wait_for host={{ ansible_default_ipv4.address }} port=22 state=started delay=30 timeout=60
  become: false

@sayantandas does the solution still work for you? I am using ansible 2.1.1.0 and get the following:
UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", "unreachable": true}

I found this answer that solve the problem for me : http://stackoverflow.com/a/39174307

- name: Restart server
  become: yes
  shell: sleep 2 && /sbin/shutdown -r now "Ansible system package upgraded"
  async: 1
  poll: 0

The local_action following the shell reboot always skips for me. A peek at the -vvv output only indicates that it was skipped because of a conditional. Anyone else experiencing this? I can open a new ticket if its seemingly un-related.

I can confirm this broke completely on 2.1 in our install. We had it working on 1.9 in the "1.9" way, upgraded Ansible to 2.1, modified the task to the "2.1" way, and it breaks every time.

This solution kinda works with my install. However, the local_action waits the whole timeout everytime : my host can restart in ~30 seconds, but if I set the wait_for timeout to 3600, Ansible will wait one hour before proceeding with the playbook... As some reboots may be longer than others (updates), I really need to have a high timeout, but can't afford wasting 15 minutes for my hosts to come back (happens 5 times in my main playbook :( )

After a bit of trial and error with various solutions posted for various versions, the following is working for me on 2.1.2 with an Ubuntu 16.04 guest VM and OS X host using Vagrant (1.8.6) and VirtualBox (5.1.8).

- name: "Reboot if required"
  shell: sleep 2 && shutdown -r now 'Reboot required' removes=/var/run/reboot-required
  become: true
  async: 1
  poll: 0
  ignore_errors: true

- name: "Wait for reboot"
  local_action: wait_for host={{ ansible_default_ipv4.address }} port=22 delay=10 state=started
  become: false

@Furiml: Not sure if this applies to what you're trying to do, but this second task will poll every 10 seconds (default) after a 10 second delay to see if port 22 on the guest machine is open before continuing i.e. it won't take the full allocated timeout value.

An update of the docs and/or the support article to use the preferred full YAML format for tasks would also be nice. This works for me:

    - name: reboot nodes
      shell: sleep 2 && shutdown -r now "Ansible reboot"
      async: 1
      poll: 0
      ignore_errors: true

    - name: wait for server to come back
      local_action: wait_for
      args:
        host: "{{ inventory_hostname }}"
        port: 22
        state: started
        delay: 30
        timeout: 300

I wrote something else to test this. Instead of waiting for an host to be up, I want to wait for it to be down.

    - name: "Wait for the machine to be down"
      local_action: wait_for
      args:
        host={{target}}
        port=22
        state=stopped
        delay=1
        timeout=3600
      become: false

If I understood well, this will poll the port 22 of my target every second and will only continue if it is closed. I shutdown the machine myself, but Ansible is stuck for 5 minutes now :(

@martineg that works great! It's now included in the Galaxy role jmcvetta.debian-upgrade-reboot.

On ansible 2.2 this does not reboot my computer. It simply says that job is started, and then waits for 22 port. But node does not reboot!

I have the same issue as @sashgorokhov on Ubuntu 16.04/ansible 2.2.1.0.

Just says "OK" and doesn't reboot.
ok: [IP] => { "ansible_job_id": "575686775528.32762", "changed": false, "finished": 0, "results_file": "/root/.ansible_async/575686775528.32762", "started": 1 }

@NoahO maybe this tiny code snippet could help you:

tasks:
    - shell: shutdown -r now

This simply reboots the node without waiting for it (in my case I really dont need to wait for it to reboot)

@sashgorokhov unfortunately I need it to reboot, Worked around it with at command, but wastes 1 minute before taking any action, so I'd prefer to have this working.

Trying to get this working on Centos 7.3 servers from F25 workstation with Ansible 2.2.1, but doesn't seems to be working. Any workaround?
At this point I seriously consider to create a separate role for the task I need to execute after reboot and call the 2 roles from a shell script with a long enough sleep in between or if I want to be fancy I can add an ssh-keyscan as well to make sure the server is up.. But would rather rely on Ansible, you know, as it is a real automation tool ;)

EDIT:
Ok I was an utter idiot and a bit close to midnight here I don't afraid to admit it...WRONG INVENTORY FFS. Works! Sorry...

Having the same problem. Any updates on this matter? Looks like many folks are facing this issue

This is still a problem (Mac version 2.3.0.0), target is a Fedora Instance in AWS. None of the above workarounds worked for me (the wouldn't error, but also didn't reboot it) so I did the following (where delayed_reboot is just a shell script, sleep and reboot):

- copy:
    src: files/delayed_reboot
    dest: /tmp/delayed_reboot
    owner: root
    group: root
    mode: 0700

- name: Restart machine
  shell: nohup /tmp/delayed_reboot &
  async: 1
  poll: 0
  ignore_errors: true
  become: true
  become_method: sudo
  when: new_kernel.changed or new_kernel_headers.changed

- name: Wait for machine to restart
  local_action:
    module: wait_for
      host={{ inventory_hostname }}
      port=22
      delay=20
      timeout=300
      state=started
  become: false
  when: new_kernel.changed or new_kernel_headers.changed
ISSUE TYPE

  • Bug Report
COMPONENT NAME


Shared connection

ANSIBLE VERSION
ansible 2.3.0.0
  config file = /Users/ebalduf/PD-git/LabOnDemand/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, Dec 18 2016, 07:03:39) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]
CONFIGURATION
grep '^[^#]' ansible.cfg
[defaults]
host_key_checking = False
timeout = 15
[privilege_escalation]
[paramiko_connection]
[ssh_connection]
control_path = %(directory)s/%%h-%%r
[persistent_connection]
[accelerate]
[selinux]
[colors]
[diff]
OS / ENVIRONMENT


Ansible host: macOS Sierra 10.12.4
target: Fedora 25 instance in AWS.

SUMMARY
STEPS TO REPRODUCE

- name: install python and deps for ansible modules
  raw: dnf install -y python2 python2-dnf libselinux-python

- name: gather facts
  setup:

- name: Install new Kernel
  dnf:
    name: https://kojipkgs.fedoraproject.org//packages/kernel/4.9.13/201.fc25/x86_64/kernel-core-4.9.13-201.fc25.x86_64.rpm
  register: new_kernel

- name: Install new Kernel headers
  dnf:
    name: https://kojipkgs.fedoraproject.org//packages/kernel/4.9.13/201.fc25/x86_64/kernel-headers-4.9.13-201.fc25.x86_64.rpm
  register: new_kernel_headers

- name: Restart machine
  command: reboot
  async: 1
  poll: 0
  ignore_errors: true
  become: true
  become_method: sudo
  when: new_kernel.changed or new_kernel_headers.changed

- name: Wait for machine to restart
  local_action:
    module: wait_for
      host={{ inventory_hostname }}
      port=22
      delay=20
      timeout=300
      state=started
  become: false
  when: new_kernel.changed or new_kernel_headers.changed
EXPECTED RESULTS


The target should reboot properly and ansible continue the playbook.

ACTUAL RESULTS


See output below with -vvv

Using module file /usr/local/lib/python2.7/site-packages/ansible/modules/commands/command.py
<34.209.10.206> ESTABLISH SSH CONNECTION FOR USER: fedora
<34.209.10.206> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=fedora -o ConnectTimeout=15 -o ControlPath=/Users/ebalduf/.ansible/cp/%h-%r 34.209.10.206 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''
<34.209.10.206> (0, '/home/fedora\n', '')
<34.209.10.206> ESTABLISH SSH CONNECTION FOR USER: fedora
<34.209.10.206> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=fedora -o ConnectTimeout=15 -o ControlPath=/Users/ebalduf/.ansible/cp/%h-%r 34.209.10.206 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672 `" && echo ansible-tmp-1493487050.48-176600574616672="` echo /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672 `" ) && sleep 0'"'"''
<34.209.10.206> (0, 'ansible-tmp-1493487050.48-176600574616672=/home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672\n', '')
<34.209.10.206> PUT /var/folders/sd/5jlrqcms5qg3bjc0g5mp5r1r0000gn/T/tmpeV4QiT TO /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/command.py
<34.209.10.206> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=fedora -o ConnectTimeout=15 -o ControlPath=/Users/ebalduf/.ansible/cp/%h-%r '[34.209.10.206]'
<34.209.10.206> (0, 'sftp> put /var/folders/sd/5jlrqcms5qg3bjc0g5mp5r1r0000gn/T/tmpeV4QiT /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/command.py\n', '')
<34.209.10.206> ESTABLISH SSH CONNECTION FOR USER: fedora
<34.209.10.206> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=fedora -o ConnectTimeout=15 -o ControlPath=/Users/ebalduf/.ansible/cp/%h-%r 34.209.10.206 '/bin/sh -c '"'"'chmod u+x /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/ /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/command.py && sleep 0'"'"''
<34.209.10.206> (0, '', '')
<34.209.10.206> ESTABLISH SSH CONNECTION FOR USER: fedora
<34.209.10.206> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=fedora -o ConnectTimeout=15 -o ControlPath=/Users/ebalduf/.ansible/cp/%h-%r -tt 34.209.10.206 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-jplodcrkimvnywjebybiuhwijxipglmt; /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/command.py; rm -rf "/home/fedora/.ansible/tmp/ansible-tmp-1493487050.48-176600574616672/" > /dev/null 2>&1'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<34.209.10.206> (255, '', 'Shared connection to 34.209.10.206 closed.\r\n')
fatal: [34.209.10.206]: UNREACHABLE! => {
    "changed": false,
    "unreachable": true
}

MSG:

Failed to connect to the host via ssh: Shared connection to 34.209.10.206 closed.

Thanks @ebalduf for putting all this together! I can also confirm that I am also facing identical issue with ansible 2.2.1 with MacOS/CentOS as ansible hosts and CentOS7 as target host. It would be nice if this bug can be prioritized!

The below code works for me
Ansible version - 2.3
Server - Ubuntu 16.04.2 LTS

Target system - RHEL 7.3

- name: restart server
  become: yes
  shell: sleep 2 && /sbin/shutdown -r now "RedHat system package upgraded"
  async: 1
  poll: 0

- name: waiting 60 secs for server to come back
  become: false
  local_action: wait_for host={{ ansible_default_ipv4.address }} port=22 state=started delay=60 timeout=120

The solution provided by @sayantandas also works for us.

Ansible version: 2.3.0.0
Server version: CentOS Linux release 7.3.1611
Target system: CentOS Linux release 7.3.1611

The solution provided by @sayantandas works for me too

Ansible version: 2.3.0.0
Server version: RHEL 7.3
Target system: RHEL 7.3

Thank you

Ansible 2.3, Centos7, below is what I went with after I do something that updates the kernel, avoids the 'wait for host to boot' if the host isn't rebooting.

  - name: Check for reboot hint.
    shell: LAST_KERNEL=$(rpm -q --last kernel | perl -pe 's/^kernel-(\S+).*/$1/' | head -1);CURRENT_KERNEL=$(uname -r); if [ $LAST_KERNEL != $CURRENT_KERNEL ]; then echo 'reboot'; else echo 'no'; fi
    ignore_errors: true
    register: reboot_hint

  - name: Rebooting ...
    shell: sleep 2 && /usr/sbin/reboot
    async: 1
    poll: 0
    ignore_errors: true
    when: reboot_hint.stdout.find("reboot") != -1

  - name: Wait for host to boot
    become: false
    local_action: wait_for
    args:
      host: "{{ inventory_hostname }}"
      port: 22
      state: started
      delay: 30
      timeout: 180
    when: reboot_hint.stdout.find("reboot") != -1

Unable to reboot properly, even with sanity checks.

Ansible 2.2.2.0

Example playbook for Ubuntu 16.04 LTS

---
- name: Refresh apt cache
  apt:
    update_cache: yes

- name: Update all packages
  apt:
    upgrade: dist

- name: Rebooting server
  shell: >
    sleep 2 &&
    /sbin/shutdown -r now "Ansible system package upgraded"
  async: 1
  poll: 0
  ignore_errors: true

- name: Wait for host to boot
  become: false
  local_action: wait_for
  args:
    host: "{{ inventory_hostname }}"
    port: 22
    state: started
    delay: 30
    timeout: 200

- name: Sanity check
  shell: ps -ef | grep sshd | grep `whoami` | awk '{print \"kill -9\", $2}' | sh
  async: 1
  poll: 0
  ignore_errors: true

- name: Remove useless packages from the cache
  apt:
    autoclean: yes

- name: Remove dependencies that are no longer required
  apt:
    autoremove: yes

Result

TASK [apt-refresh : Remove useless packages from the cache] ********************
fatal: [xxxx]: FAILED! => {"changed": false, "failed": true, "module_stderr": "OpenSSH_7.2p2 Ubuntu-4ubuntu2.2, OpenSSL 1.0.2g  1 Mar 2016\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: mux_client_request_session: master session id: 2\r\nShared connection to xxxx closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_MJ_gDg/ansible_module_apt.py\", line 903, in <module>\r\n    main()\r\n  File \"/tmp/ansible_MJ_gDg/ansible_module_apt.py\", line 855, in main\r\n    for package in packages:\r\nTypeError: 'NoneType' object is not iterable\r\n", "msg": "MODULE FAILURE"}

ansible.cfg

[defaults]
inventory = hosts
host_key_checking = False
remote_user = ubuntu
private_key_file = id_rsa
retry_files_enabled = False

[ssh_connection]
ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s
control_path = /tmp/%%h-%%p-%%r

I was having similar problems. I added a 'pause' task for 30 seconds inbetween the shell: shutdown now -r and the wait_for task. Now things consistently work. I also have these as handlers with listen, so they only run when needed.

I had similar issues when trying to reboot our Ubuntu 16.04 hosts with ansible configured to use python3. As soon as I installed python 2.7 on the Ubuntu 16.04 hosts (apt-get install python-minimal) and configured ansible to use it on the remote system, the reboot worked fine.
Whenever I used "async" in my task, exactly nothing happened when ansible used python3, not even very basic things like "echo test > /tmp/testfile".

Addition: I'm using ansible 2.3.1.0 installed via deb package from http://ppa.launchpad.net/ansible/ansible/ubuntu

So there is a wait_for_connection action plugin that will wait until the system becomes available again and validates this by doing an end-to-end test. This is more reliable than testing if a port is responding again.

We are working on a new reboot action plugin that will perform a reboot, will wait for the connection to start working again and finally checks if the system was actually rebooted.

cc @AnderEnder @gregswift @jarv @jhoekx
click here for bot help

Target system - CentOS 7.4

- name: restart server
  become: yes
  shell: sleep 2 && /sbin/shutdown -r now "System reboot"
  #command: /usr/bin/systemd-run --on-active=10 /usr/bin/systemctl reboot
  async: 1
  poll: 0

- name: waiting 10 secs for server to come back
  become: false
  local_action: wait_for host={{ ansible_default_ipv4.address }} port={{ ansible_port }} state=started delay=10 timeout=120  

@peterwillcn IMO You better use wait_for_connection instead of wait_for, see: http://docs.ansible.com/ansible/latest/wait_for_connection_module.html

It's not just easier, it also works over jumphosts or proxies, using the exact same transport Ansible uses for the target node.

@dagwieers how is the reboot action plugin coming along ?

Found a couple roles out there to take care of this:

@afeld that role looks great.

So my issue with all the examples I have seen is that it all relies on some random wait time before starting the poll to see if ssh port is available. Now given different hosts require differing amounts of time to shutdown processes depending on what it’s doing - it either means you have to set a long delay to catch the worst offender or you risk false positives.

The new wait_for_connection just uses ping and another random delay factor (see above). So again huge risk of false positives (confirmed by redhat support).

The way I have made this slightly more robust is using 2 tasks - first one waits for ssh port to be absent - this starts immediately and has maximum wait of 15 mins, polls every second - this should be plenty of time for server processes to shutdown and means that you should only have to wait for regular os services to stop.

The second ssh not running It starts task 2 - wait for ssh port state - started after 1 min delay.

Note the wait_for port doesn’t rely on ssh it uses a python socket to determine if port is up

Andy

Sent from my iPhone

On 12 Dec 2017, at 05:41, Shaun Smiley notifications@github.com wrote:

@afeld that role looks great.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@akcrisp And the problem with your implementation is that it fails for anything but the simple direct-connection use-case. The wait_for_connection module used to do this as well, but it fails for ssh_proxy, or other proxied transport connections, so we had to remove it.

You can make the delay-time configurable per system/group or other characteristics, but that's not ideal.

Agreed but not clear how you fix this without ? A none deterministic finger in the air random delay ?

Sent from my iPhone

On 12 Dec 2017, at 16:20, Dag Wieers notifications@github.com wrote:

@akcrisp And the problem with your implementation is that it fails for anything but the simple direct-connection use-case. The wait_for_connection module used to do this as well, but it fails for ssh_proxy, or other proxied transport connections, so we had to remove it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Isn't it possible to add ssh check to wait_for connection
Plus add grace_timeout kinda thing like in AWS Autoscaling Groups - wait some more time after "connection" is established

this worked well for me:
_ansible: 2.4.1.0
Ubuntu: 16.04.3 LTS
Linux 4.4.0-98-generic_

  - name: reboot server
    become: yes
    shell: sleep 2 && /sbin/shutdown -r now "System reboot"
    async: 1
    poll: 0

  - name: Wait for restart
    local_action: wait_for port=22 host="{{ ansible_ssh_host | default(inventory_hostname) }}"  search_regex=OpenSSH delay=60

  - name: continue running script after reboot
    shell: 'sh /home/ubuntu/my_script.sh'

This worked for me on Ansible 2.4.2.0 and Ubuntu 16.04 LTS on Azure

- hosts: all
  become: yes
  become_user: root
  pre_tasks:
    - name: Patching for Spectre and Meltdown followed by a reboot
      become: yes
      shell: nohup bash -c 'sleep 2 && apt -y update && apt -y upgrade && apt -y autoremove && reboot "System reboot"' &
      async: 1
      poll: 0

    - name: Wait for 3 minutes for server to come online
      become: false
      local_action: wait_for port=22 host={{ ansible_ssh_host | default(inventory_hostname) }} search_regex='OpenSSH' delay=180 timeout=300

I guess my use case was much more complex. Here's mine written as a handler:

- name: Inform of reboot required
  listen: reboot machine
  debug:
    msg: "System {{ inventory_hostname }} needs to be rebooted for changes to take effect"

- name: Update GRUB to pick up changes to default config, if any
  command: update-grub2
  listen: reboot machine

  # Send the reboot command and let it run in the background
  # so we can disconnect...
- name: Send reboot command
  listen: reboot machine
  shell: '(sleep 5; shutdown -r now) &'

- name: Clear host errors
  listen: reboot machine
  meta: clear_host_errors
  failed_when: false

- name: Reset connection
  listen: reboot machine
  meta: reset_connection
  failed_when: false

- name: Wait for SSH to be available
  listen: reboot machine
  local_action: wait_for
  args:
    host: "{{ ansible_host }}"
    port: "{{ ansible_port | default('22') }}"
    delay: 60
    state: started

- name: Ansible ping
  listen: reboot machine
  local_action: ping
  register: result
  until: result.ping is defined and result.ping == 'pong'
  retries: 30
  delay: 10

- name: Run uptime
  listen: reboot machine
  command: uptime

  # LACP and spanning-tree take a bit of time to start working
- name: Ping default gateway
  listen: reboot machine
  command: "ping -c 1 {{ ansible_default_ipv4.gateway }}"
  register: result
  until: result.rc == 0
  retries: 30
  delay: 10

Here's my solution (Ansible 2.4.2):

- name: restart machine
  shell: nohup sh -c '(sleep 5; shutdown -r now "Ansible restart") &' &>/dev/null
  become: yes

- name: wait for machine to restart
  wait_for_connection:
    delay: 60
    sleep: 5
    timeout: 300

this worked for me:

- name: restart the system
    shell: "sleep 5 & reboot"
    async: 1
    poll: 0

- name: wait for the system to reboot
    wait_for_connection:
      connect_timeout: 20
      sleep: 5
      delay: 5 
      timeout: 60

All these workarounds are interesting, but the real fix will be

We are working on a new reboot action plugin that will perform a reboot, will wait for the connection to start working again and finally checks if the system was actually rebooted.

Right? (from https://github.com/ansible/ansible/issues/14413#issuecomment-330523110)

Confirmed.

looking forward to it

I am interested to know whether any reboot module will support various Unix flavours beyond Linux ? Ie aix / Solaris etc, I assume it works with windows ?

The point I made with my example and most seem to have missed it - is that by simply having a time out of waiting for port 22 - it’s entirely possible to get a false positive - if a host takes longer to shutdown processes i.e. think large database - than the delay factor then it may well not have actually rebooted - tested and proved this can happen - hence my test to ensure ssh is absent first.

Andy

Sent from my iPhone

On 28 Feb 2018, at 15:04, Dag Wieers notifications@github.com wrote:

Confirmed.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@akcrisp That is the intention. The discussion was linked here before: https://github.com/ansible/ansible/issues/16186


---
- hosts: all
- name: restart the system
    shell: "sleep 5 & reboot"
    async: 1
    poll: 0

- name: wait for the system to reboot
    wait_for_connection:
      connect_timeout: 20
      sleep: 5
      delay: 5
      timeout: 60

ansible-playbook test.yaml
ERROR! Syntax Error while loading YAML.
mapping values are not allowed in this context

The error appears to have been in '/etc/ansible/test.yaml': line 4, column 10, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  • name: restart the system
    shell: "sleep 5 & reboot"
    ^ here

help please ;-)

---
- hosts: all
- name: restart the system
  shell: "sleep 5 & reboot"
     async: 1
     poll: 0

- name: wait for the system to reboot
  wait_for_connection:
       connect_timeout: 20
       sleep: 5
       delay: 5
       timeout: 60

Try this.

Grief - I wish people would read what I have done. Everyone just waiting for timeout risks a false positive. It would only take an app along time to shutdown and it will think ssh is up after reboot. I have tested it.

You are far better off checking ssh is absent first - that doesn’t rely on shh - uses a python socket connection

Sent from my iPhone

On 16 Apr 2018, at 09:21, Ben Abineri notifications@github.com wrote:


  • hosts: all
  • name: restart the system
    shell: "sleep 5 & reboot"
    async: 1
    poll: 0

  • name: wait for the system to reboot
    wait_for_connection:
    connect_timeout: 20
    sleep: 5
    delay: 5
    timeout: 60
    Try this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

You're right - my comment wasn't an endorsement of the design, I just wanted to demonstrate the correct formatting.

This related bug seems to be a cause of the async reboot task failing to run as @pyroxde noted earlier

37941

So we now have a reboot and win_reboot action plugin to reboot Unix and Windows servers. If you have any issues with the existing implementation, feel free to open a new issue with any specifics.

Was this page helpful?
0 / 5 - 0 ratings