Ansible: SSH Error: Shared connection to x.x.x.x closed.

Created on 4 Apr 2015  ·  55Comments  ·  Source: ansible/ansible

While attempting to do a reboot, using a playbook containing:

- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true

I get the following error, and the playbook ends:

fatal: [x.x.x.x] => SSH Error: Shared connection to x.x.x.x closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

In previous versions (i.e. 1.8.1), the above playbook continues, ignoring the error.

Most helpful comment

I know that this ticket is closed, but I thought that I'd add my solution for any Googlers out there. Instead of:

shell: sleep 2 && shutdown -r now "Ansible updates triggered"

I used:

shell: /sbin/shutdown -r -t 3

All 55 comments

I can't reproduce this as written (what version of Ansible were you using?), but I can with this:

- name: Reboot
  shell: 'shutdown -r now "Ansible updates triggered" && sleep 10'
  async: 0
  poll: 0
  ignore_errors: true

This seems to be because of a race between the ssh command finishing and the ssh server being shut down. Looking at the source, it looks like async: 0 means "run synchronously", so it's not surprising this fails. If you change it to async: 1, it works fine for me.

I can get the same failure with 1.8.1, so probably something in your setup or Ansible has tipped the balance of the race condition for your task.

Maybe https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back just needs to be changed to say async: 1 rather than async: 0?

Thanks for trying jder! I'm running version 1.9.0.1 (installed via brew), on either Mac OS X 10.10.2, and on a CentOS 7 server (installed via pip). One thing I omitted from my post was that I was only running this when a variable is set, so the full play looks like:

- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true
  when: runUpdates

I didn't think that'd make a difference though.

I also tried setting async: 1 but that did not resolve my problem.

@darrylc Can you show a complete playbook & full output? (Maybe with the issue template?)

I'm able to run your task without error with the same OS X and Ansible versions. Do you have a task _after_ the reboot task? For example, this fails, regardless of the value of async:

- 
  hosts: all
  vars: 
    runUpdates: true
  tasks:
    - name: Reboot
      command: shutdown -r now "Ansible updates triggered"
      async: 1
      poll: 0
      ignore_errors: true
      when: runUpdates

    - name: After reboot
      shell: 'sleep 5 && echo hi'

Because even though the first task succeeds, the second one fails with the error you're seeing.

Sure. This is the full contents of the playbook, but it's included around other things. If you'd like to see some of the other plays above/below, let me know.

- name: Reboot
  command: shutdown -r now "Ansible updates triggered"
  async: 0
  poll: 0
  ignore_errors: true
  when: runUpdates

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started
  sudo: false
  when: runUpdates

Here's an output:

TASK: [common | Reboot] *******************************************************
fatal: [10.10.0.233] => SSH Error: Shared connection to 10.10.0.233 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.0.234] => SSH Error: Shared connection to 10.10.0.234 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.249] => SSH Error: Shared connection to 10.10.1.249 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.248] => SSH Error: Shared connection to 10.10.1.248 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.12] => SSH Error: Shared connection to 10.10.1.12 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/centos/site.retry

10.10.0.233                : ok=17   changed=16   unreachable=1    failed=0
10.10.0.234                : ok=17   changed=16   unreachable=1    failed=0
10.10.1.12                 : ok=17   changed=16   unreachable=1    failed=0
10.10.1.248                : ok=17   changed=16   unreachable=1    failed=0
10.10.1.249                : ok=17   changed=16   unreachable=1    failed=0
127.0.0.1                  : ok=25   changed=8    unreachable=0    failed=0

Previously, in 1.8.1, I got this output:

TASK: [nat | Reboot] **********************************************************
failed: [x.x.x.x] => {"failed": true, "parsed": false}
SUDO-SUCCESS-gaixedfvfwciqldgvvrcxtieejnprhbe

...ignoring

TASK: [nat | Waiting for all app servers] *************************************
ok: [x.x.x.x -> 127.0.0.1]

You might try running with -vvvv. I'd also be interested in the task & output that comes before the reboot task.

The task before the reboot is:

- name: Set SELINUX to Permissive
  selinux: state=permissive policy=targeted

Here's the verbose output of those two plays:

TASK: [common | Set SELINUX to Permissive] ************************************
<10.10.0.233> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.233> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015'
EXEC previous known host file not found for 10.10.0.233
<10.10.0.233> PUT /tmp/tmpk3ODZU TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/selinux
<10.10.0.234> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.234> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427'
EXEC previous known host file not found for 10.10.0.234
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ekxspcioyrcyceijzspufsbujionstgk] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ekxspcioyrcyceijzspufsbujionstgk; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.36-264711243780015/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.233
<10.10.0.234> PUT /tmp/tmpXup5iQ TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/selinux
<10.10.1.249> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.249> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988'
EXEC previous known host file not found for 10.10.1.249
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=luojdoizsqkbbeebaqqjmmnqrcoisljv] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-luojdoizsqkbbeebaqqjmmnqrcoisljv; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.46-124848358438427/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.234
<10.10.1.249> PUT /tmp/tmpaUM1Vx TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/selinux
<10.10.1.248> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.248> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862'
EXEC previous known host file not found for 10.10.1.248
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=meugdinapovzmnnawghagziymmulwrka] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-meugdinapovzmnnawghagziymmulwrka; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.54-222918139378988/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.249
<10.10.1.248> PUT /tmp/tmp_DUedY TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/selinux
<10.10.1.12> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.12> REMOTE_MODULE selinux state=permissive policy=targeted
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330 && echo $HOME/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330'
EXEC previous known host file not found for 10.10.1.12
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=dqiddmrxmpjrqpokdjwrwlrskglpedth] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-dqiddmrxmpjrqpokdjwrwlrskglpedth; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.63-68654424535862/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.248
<10.10.1.12> PUT /tmp/tmpqoHJx2 TO /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/selinux
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=rvubjjqnipjqybarkttszpgpfruvaudr] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-rvubjjqnipjqybarkttszpgpfruvaudr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/selinux; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172087.72-168887775441330/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.12
ok: [10.10.0.233] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.0.234] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.249] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.248] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}
ok: [10.10.1.12] => {"changed": false, "configfile": "/etc/selinux/config", "msg": "", "policy": "targeted", "state": "permissive"}

TASK: [common | Reboot] *******************************************************
<10.10.0.233> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.233> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155'
EXEC previous known host file not found for 10.10.0.233
<10.10.0.233> PUT /tmp/tmprmK6QE TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/command
<10.10.0.234> ESTABLISH CONNECTION FOR USER: centos
<10.10.0.234> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405'
EXEC previous known host file not found for 10.10.0.234
<10.10.0.233> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.233 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ghvlhadtwkcoouhxllakzdanqqrrhxox] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ghvlhadtwkcoouhxllakzdanqqrrhxox; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.59-107964187200155/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.233
<10.10.0.234> PUT /tmp/tmpE5WSr6 TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/command
<10.10.1.249> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.249> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570'
EXEC previous known host file not found for 10.10.1.249
<10.10.0.234> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.0.234 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=lfxcjguxnbvasijicgikabimychiqxqu] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lfxcjguxnbvasijicgikabimychiqxqu; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.68-234873763936405/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.0.234
<10.10.1.249> PUT /tmp/tmp1swEiT TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/command
<10.10.1.248> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.248> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214'
EXEC previous known host file not found for 10.10.1.248
<10.10.1.249> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.249 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=txrdvprualxegtspglqctpdduavwvjmv] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-txrdvprualxegtspglqctpdduavwvjmv; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.77-77338967532570/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.249
<10.10.1.248> PUT /tmp/tmpmiLyZ6 TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/command
<10.10.1.12> ESTABLISH CONNECTION FOR USER: centos
<10.10.1.12> REMOTE_MODULE command shutdown -r now "Ansible updates triggered"
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752 && echo $HOME/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752'
EXEC previous known host file not found for 10.10.1.12
<10.10.1.248> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.248 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ajywauuhwsziswsyfutujlqmxquugbfm] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ajywauuhwsziswsyfutujlqmxquugbfm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.86-10771634536214/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.248
<10.10.1.12> PUT /tmp/tmpjCweZe TO /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/command
<10.10.1.12> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/home/centos/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 10.10.1.12 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=lhxgthxuwqarmkukfrwjydenlojxxyfh] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lhxgthxuwqarmkukfrwjydenlojxxyfh; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/command; rm -rf /home/centos/.ansible/tmp/ansible-tmp-1428172088.95-256559000632752/ >/dev/null 2>&1'"'"''
EXEC previous known host file not found for 10.10.1.12
fatal: [10.10.0.233] => SSH Error: Shared connection to 10.10.0.233 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.0.234] => SSH Error: Shared connection to 10.10.0.234 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.249] => SSH Error: Shared connection to 10.10.1.249 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.248] => SSH Error: Shared connection to 10.10.1.248 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
fatal: [10.10.1.12] => SSH Error: Shared connection to 10.10.1.12 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/home/centos/site.retry

10.10.0.233                : ok=20   changed=3    unreachable=1    failed=0
10.10.0.234                : ok=20   changed=3    unreachable=1    failed=0
10.10.1.12                 : ok=17   changed=3    unreachable=1    failed=0
10.10.1.248                : ok=17   changed=3    unreachable=1    failed=0
10.10.1.249                : ok=17   changed=3    unreachable=1    failed=0
127.0.0.1                  : ok=25   changed=7    unreachable=0    failed=0

I should also note that I have this in my ssh config during the play:

Host 10.*.*.*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

So, the known hosts message shouldn't be a reason to abort the play.

I was able to reproduce this with the following playbook:

- hosts: all
  sudo: true
  tasks:
    - name: kill connection
      command: killall sshd
      ignore_errors: 1
      async: 1
      poll: 0

Despite the async: 1 (or 0) and the ignore_errors, this produces the same ssh error you're seeing. I'm looking more into it.

I think what's going on is that when you launch an async process on the remote host (with async: 1), what it does is run a small synchronous process which sleeps for 1 second and then returns a small amount of JSON, as well as actually starting the async job. The problem is that if the SSH connection is torn down before the small synchronous job finishes, Ansible treats this as an unreachable error. I'm not sure what the "right" solution is (and I don't see changes here since 1.8.1), but this workaround works for me:

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  ignore_errors: true

hmm, so that seems to advance my scripts, but it doesn't seem to reboot my server.

Ah, sorry, you'll probably need to ask your wait_for command to delay for a few seconds before starting to poll now, since the reboot is now running asynchronously and delayed by 2 seconds:

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started delay=10
  sudo: false
  when: runUpdates

Server is still not rebooting. The delay is happening, but I suspect that the 'sleep 2 && shutdown -r now "Ansible updates triggered"' command isn't actually working.

it looks like running 'sleep 2 && shutdown -r now' via sudo requires a password. Running 'shutdown -r now' via sudo does not. Running 'sleep 2 && shutdown -r now' as root doesn't require a password either. I realize this might be outside the scope of Ansible, but any ideas?

That's very strange. Perhaps your sudoers configuration is set up to only allow certain commands to be run without a password?

Nope, sudo for that user (at that time) has full access

Well, it was a long shot; I don't think Ansible does sudo $COMMAND; it runs a script which then runs your command, so it would be hard to understand why one would work and the other wouldn't. Sorry, I really don't understand how that could require a password. Does just shell: shutdown -r now have the same problem?

sleep by itself, or shutdown by itself, seems to work

What happens if you just run it via Ansible synchronously?

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"

Goes back to the original "fatal: [x.x.x.x] => SSH Error: Shared connection to x.x.x.x closed."

So, that clearly reboots. But when you add the async: 1 and poll: 0, it no longer reboots?

Hmm, might be working now. I changed it from a command: task to a shell: task. I hadn't noticed the discrepancy between our plays until now. I'll continue to test.

Yes, seems to be working! Thanks so much jder!

Here's the playbook:

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  sudo: true
  ignore_errors: true
  when: runUpdates

- name: Waiting for all app servers
  local_action: wait_for host={{ inventory_hostname }}
                port=22 state=started delay=10
  sudo: false
  when: runUpdates

Great! Glad to hear it.

On Apr 7, 2015, at 10:17 PM, Darryl Chin [email protected] wrote:

Yes, seems to be working! Thanks so much jder!

Here's the playbook:

  • name: Reboot
    shell: sleep 2 && shutdown -r now "Ansible updates triggered"
    async: 1
    poll: 0
    sudo: true
    ignore_errors: true
    when: runUpdates
  • name: Waiting for all app servers
    local_action: wait_for host={{ inventory_hostname }}
    port=22 state=started delay=10
    sudo: false
    when: runUpdates

    Reply to this email directly or view it on GitHub.

Hi ...
i have same error ...
i am using roles and i have included handler files i get same error i try local handler/main.yml
same error ...

can understand why it is not Waiting....
some time it is working some not .... i cant understand why ? ? ? ?

I know that this ticket is closed, but I thought that I'd add my solution for any Googlers out there. Instead of:

shell: sleep 2 && shutdown -r now "Ansible updates triggered"

I used:

shell: /sbin/shutdown -r -t 3

@darrylc
As to your question about why "it looks like running 'sleep 2 && shutdown -r now' via sudo requires a password."
You can run using sudo either sleep 2 or shutdown -r now, but not both combined -- because && is interpreted by the shell.
There's no such command 'sleep 2 && shutdown -r now' -- that's why sudo is asking for a password (normal behaviour if you try to run sudo <unknown_command>).

Long story short, I'd rather try (you don't really need sudo to sleep for two seconds):
sleep 2 && sudo shutdown -r now

same problem in ansible-1.9.1,don't know why there should be an sleep 2

shutdown -r +2? the wait will be built in

On Thu, May 21, 2015 at 12:58 AM, ZhiMing Zhang [email protected]
wrote:

same problem in ansible-1.9.1,don't know why there should be an sleep 2


Reply to this email directly or view it on GitHub
https://github.com/ansible/ansible/issues/10616#issuecomment-104136002.

Brian Coca

HI
i used

  • name: Reboot
    shell: sleep 2 && shutdown -r now "Ansible updates triggered"
    async: 1
    poll: 0
    sudo: true
    ignore_errors: true

this is working for me ... cant understand why but work is work :)
and i dont see the point for the sleep 2 ? why do we need it ?

the issue seems to be that the connection is imediatly killed which results
in an error, the sleep prevents this from happening.

On Thu, May 21, 2015 at 10:48 AM, noamgr [email protected] wrote:

HI
i used

  • name: Reboot shell: sleep 2 && shutdown -r now "Ansible updates
    triggered" async: 1 poll: 0 sudo: true ignore_errors: true

this is working for me ... cant understand why but work is work :)
and i dont see the point for the sleep 2 ? why do we need it ?


Reply to this email directly or view it on GitHub
https://github.com/ansible/ansible/issues/10616#issuecomment-104305480.

Brian Coca

thanks :)

For posterity, the reason I wasn't using shutdown -r +2 is that my version of shutdown treats that as "2 minutes".

no ... this is the command by default for linux dist :) its in minus use sleep 2 && shutdown -r

ah, did not realize the +# was in minutes ... too used to the current fast
paced world!

On Thu, May 21, 2015 at 12:16 PM, noamgr [email protected] wrote:

no ... this is the command :) its in minus use sleep 2 && shutdown -r


Reply to this email directly or view it on GitHub
https://github.com/ansible/ansible/issues/10616#issuecomment-104339871.

Brian Coca

same error
ansible -vvvv 192.168.1.254 -m setup
<192.168.1.254> ESTABLISH CONNECTION FOR USER: root
<192.168.1.254> REMOTE_MODULE setup
<192.168.1.254> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 192.168.1.254 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1434089342.6-168328121475741 && echo $HOME/.ansible/tmp/ansible-tmp-1434089342.6-168328121475741'
192.168.1.254 | FAILED => SSH Error: Shared connection to 192.168.1.254 closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

reboot system ,error lost ; by version 1.91

Hi,

I am running into the same issue and I have no success in a few days. Any help is appreciated.

"fatal: [54.184.91.116]: FAILED! => {
"changed": false,
"failed": true,
"msg": "BECOME-SUCCESS-dbwbjmofefcssrvsteepiobrzztqvjsc\r\nTraceback (most recent call last):\r\n File \"/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command\", line 1871, in \r\n main()\r\n File \"/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command\", line 91, in main\r\n module = CommandModule(argument_spec=dict())\r\n File \"/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command\", line 535, in init\r\n self._check_for_check_mode()\r\n File \"/root/.ansible/tmp/ansible-tmp-1437677072.45-174432208914762/command\", line 1071, in _check_for_check_mode\r\n for (k,v) in self.params.iteritems():\r\nAttributeError: 'tuple' object has no attribute 'iteritems'\r\nOpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014\r\ndebug1: Reading configuration data /home/ubuntu/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 30268\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to 54.184.91.116 closed.\r\n",
"parsed": false
}"

Have you tried my answer above?

- name: Reboot
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  ignore_errors: true

It won't work since 1.9.1 I guess.

But this one will still work:

- name: reboot server
  shell: /bin/echo "/sbin/reboot" | /usr/bin/at now + 1 min

this is known issue i think it will fix in 2.0 ... i am using
ignore_errors: true and its work :)

how could I reboot the server using this technique with ad-hoc command line? I've tried the bellow command but no sucess even if It shows success message. The VM isn't rebooted...

$ ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -a 'sleep 2 && shutdown -r now "Ansible updates triggered"'

background launch...


54.94.167.25 | success >> {
    "ansible_job_id": "601411637602.2026",
    "results_file": "/root/.ansible_async/601411637602.2026",
    "started": 1
}

My mistake... this is the correct way:

ansible tag_v3_api_update_True -B 1 -P 0 -b -v -i ec2.py -m shell -a 'sleep 2 && shutdown -r now "Ansible updates triggered"'

Now it works

Delete the files from the ansibile server from where you are running the commands,

cd $HOME/.ansible/cp/

rm -rf __

Hello,

It clearly depends on the operating system. Moving from Debian 7 (Wheezy) to 8 (Jessie) on a Vagrant box shown this issue. I've upgraded from Ansible 1.9.4 to 2.0.0.2 and I get the exact same behaviour on both versions. If I had to guess, I'd say the order the services are stopped on has been changed and it's causing this.

Anyway we can gather the facts again after the machine reboots and we wait for it to come back? Once the machine reboots, i need to gather the facts again before moving on with the next plays. Any thoughts?

I am now seeing this with 1.9.4 but not with shell. This does not always fail, but often enough.

- name: Create the agent import file
  copy:
    content: "{{inventory_hostname}},{{ansible_hostname}}"
    dest: "{{ossec_dir}}/tmp/agent-{{ansible_hostname}}"
    group: "{{ossec_group}}"
  delegate_to: "{{ossec_server}}"
  when: _register_agent
``

I had a task that worked on ansible < 2.1.0.0 start to fail in this way with 2.1.0.0. The solution from @jder above worked for me.

Old task:

- name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
  command: shutdown -r now 
  async: 0
  poll: 0
  ignore_errors: true
  become: yes

new version that works with 2.1.0.0:

- name: 'Restarting host machine(s) (Shows errors - OK to ignore!)'
  shell: sleep 2 && shutdown -r now 
  async: 1
  poll: 0
  ignore_errors: true
  become: yes

I think the most graceful way to wait for a reboot to complete would be to take a note of the boot time prior to issuing a reboot, and wait for the boot-time to change. Some systems take a long time to complete shutdown, and assuming that shutdown takes less than a couple of minutes before SSH becomes unreachable is asking for problems,

Not sure how to code that in Ansible yet. I wonder how portable the command 'uptime --since' is in the Linux world... doesn't appear to be present in RHEL5, RHEL6, and makes an appearance in RHEL at RHEL7.

At least the following should be highly portable on Linux systems:

expr $(date +%s) - $(cut -d. -f1 /proc/uptime)

or if you want something more human readable

date --date=@$(expr $(date +%s) - $(cut -d. -f1 /proc/uptime))

or something with ISO8601 date format, as an ad-hoc shell command

ansible cf_canary -m shell -a 'date +%y-%m-%dT%H:%M:%S --date=@$(expr $(date +%s) - $(cut -d. -f1 /proc/uptime))'

Output is like: 16-06-24T14:58:05

It seems like there's enough complexity here for a reboot module, that would encapsulate all the platform specific differences in rebooting a node.

There's a windows one, contributed by @nitzmahone here: https://github.com/ansible/ansible/pull/15314, shipping with 2.1, docs here: https://github.com/ansible/ansible-modules-core/pull/3376/files - and an issue request for a cross platform module here: https://github.com/ansible/ansible/issues/16186

@cameronkerrnz I went for a combination in win_reboot- I actually added the ansible_lastboot fact to Windows for that exact purpose, but ended up not shipping the version of win_reboot that used it for various reasons (instead waiting for the port to go down, then back up, then for a "canary" command to succeed over WinRM). I might still revisit that at some point. At least on WinRM, it's expensive to wait for the connect timeouts and stuff if the port's not open and responding- ssh might be a little cheaper there, but hard to tell without just trying (plus some systems refuse connections until SSH is ready, others will accept and tarpit/block).

I think the only real question outstanding is how generic we want this to be? E.g., do we want to have a python dep on the client or just make it purely ssh/shell/command-based, so it can potentially be used on things like switches/routers/embedded devices? I'd kinda lean toward the latter (where you could override the command that gets sent for your platform of choice), but that might somewhat limit some of the other behaviors (eg, calculating "yes, we actually rebooted" by sampling an uptime cmd, etc).

@nitzmahone As a useful point of self-imposed abstraction to make later room for switches/etc., might I suggest the module be something like posix_reboot? Presumably, all such platforms would have to have Python 2.4 anyway, no?

One thing to keep in mind, which helps to limit the scope of what this module could reasonably achieve, would be that noting/comparing the boot-time is in itself only useful for answering question "have we rebooted yet [or is something still taking ages to shut everything down]".

I'm not convinced there is a good way of answering the question "are we ready to resume our play yet"... at least not without some knowledge of the platform deployment; so that should reasonably be left up to the play writer...

eg. In RHEL6 or any other SysV init system, even if you can SSH into machine while it is booting, it might still be busy running fsck

...but I suppose if you have a file: path=/stillbooting state=present as the last action before rebooting, and putting rm /stillbooting to the end of rc.local would be a useful measure, and then you could use a wait_for: path=/stillbooting state=absent. Perhaps greater hope is available in RHEL7 and the SystemD world.

But certainly you can expect to have to have something that implements multiple strategies, like the hostname module. That would be a given simply to determine the boot time.

This workaround solved the issue on my Fedora 24 with ansible-2.2.0.0 managing a CentOS 7.3 virtual machine after a yum update. Using the shell module is fundamental because of the && shell builtin.

- name: Reboot the server
  shell: sleep 2 && shutdown -r now 'Maintenance reboot'
  async: 1
  poll: 0
  ignore_errors: true

it has solved my problem! thank you!

With "raw" module (when python is not installed or usable) this sleep hack doesn't work and async is also not usable. I tried a lot of different command lines and only one that finally worked was:

- name: restart node
  raw: "echo -e '#/bin/sh\n(sleep 2; sudo shutdown --reboot now) &\n' > /tmp/reboot.sh && chmod +x /tmp/reboot.sh && nohup /tmp/reboot.sh"

What is the source of this problem?
Yesterday I run my script without any problem and this morning for the first time I've the same issue.
Script is not changed, and the VM has the same base
Is changed my hosts file, but I've no idea how / what I've broken

Was this page helpful?
0 / 5 - 0 ratings