Ansible: [v2] 'Timeout (12s) waiting for privilege escalation prompt' when using with_items for commands that last longer than 12 seconds

Created on 24 Nov 2015  ·  123Comments  ·  Source: ansible/ansible

Issue Type
Bug Report

Component Name
become

Ansible version

ansible 2.0.0 (stable-2.0 90021104d5) last updated 2015/11/24 08:59:25 (GMT +300)
  lib/ansible/modules/core: (detached HEAD 273112c56d) last updated 2015/11/24 09:00:04 (GMT +300)
  lib/ansible/modules/extras: (detached HEAD e46e2e1d6f) last updated 2015/11/24 09:00:04 (GMT +300)

(Any version starting with commit 9a8e95bff3d01cd06f193ead91997e21dc137bdb actually.)

Summary

When you have a task that uses become and with_items, and any of the items (except for the 1st one) takes longer than 12 seconds, Ansible v2 fails with

ERROR! Timeout (12s) waiting for privilege escalation prompt: BECOME-SUCCESS-nkcpdbuvzuejmtbyuzqwselaucozhqbs\r\n"

because it's waiting for the BECOME-SUCCESS-xxx marker assigned to loop item 1 while executing loop items 2 and further.

(When the command lasts shorter than the timeout, all you get is an unnecessary delay but no error.)

Steps to Reproduce

See this comment.


Original description

My usual "let's test my playbook on today's ansible 2.0 from git" failed with

TASK [fridge : git [email protected]:/git/{{ item }}.git dest=/var/www/{{ item }}] ***
fatal: [precise]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (12s) waiting for privilege escalation prompt: BECOME-SUCCESS-nkcpdbuvzuejmtbyuzqwselaucozhqbs\r\n"}

The task looks like this:

- git: [email protected]:/git/{{ item }}.git dest=/var/www/{{ item }}
  with_items:
    - foo.pov.lt
    - bar.pov.lt
    - baz.pov.lt
  tags: websites

The output of ansible -vvv looks like this:

TASK [fridge : git [email protected]:/git/{{ item }}.git dest=/var/www/{{ item }}] ***
task path: /home/mg/src/deployments/provisioning/roles/fridge/tasks/websites.yml:2
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 (umask 22 && mkdir -p "`echo $HOME/.ansible/tmp/ansible-tmp-1448351140.8-275883201561985`" && echo "`echo $HOME/.ansible/tmp/ansible-tmp-1448351140.8-275883201561985`")
<127.0.0.1> PUT /tmp/tmpkjFrrK TO /home/vagrant/.ansible/tmp/ansible-tmp-1448351140.8-275883201561985/git
<127.0.0.1> SSH: EXEC sftp -b - -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r [127.0.0.1]
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-bejiwblxcnlvwxavgtiyybaxriyuuxtj; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448351140.8-275883201561985/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448351140.8-275883201561985/" > /dev/null 2>&1'"'"''
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 (umask 22 && mkdir -p "`echo $HOME/.ansible/tmp/ansible-tmp-1448351142.49-40339675610161`" && echo "`echo $HOME/.ansible/tmp/ansible-tmp-1448351142.49-40339675610161`")
<127.0.0.1> PUT /tmp/tmpz0oR25 TO /home/vagrant/.ansible/tmp/ansible-tmp-1448351142.49-40339675610161/git
<127.0.0.1> SSH: EXEC sftp -b - -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r [127.0.0.1]
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-fhitkmndxtktnuxfylbdajxvwdvqlhqm; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448351142.49-40339675610161/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448351142.49-40339675610161/" > /dev/null 2>&1'"'"''
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 (umask 22 && mkdir -p "`echo $HOME/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362`" && echo "`echo $HOME/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362`")
<127.0.0.1> PUT /tmp/tmpywejXi TO /home/vagrant/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362/git
<127.0.0.1> SSH: EXEC sftp -b - -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r [127.0.0.1]
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362/" > /dev/null 2>&1'"'"''
fatal: [precise]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (12s) waiting for privilege escalation prompt: BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb\r\n"}

False Trail

_Please ignore this bit, I was led astray by misleading verbose output. Keeping this only to allow the first few comments to make sense, although you can ignore those too._

I've tried the last SSH command manually, after removing the > /dev/null 2>&1 bit and inserting echo in front of python and rm, to get this:

$ ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 echo /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362/git; echo rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448351144.14-196079876165362/" '"'"''
OpenSSH_6.9p1 Ubuntu-2, OpenSSL 1.0.2d 9 Jul 2015
debug1: Reading configuration data /home/mg/.ssh/config
debug3: kex names ok: [diffie-hellman-group1-sha1]
debug1: /home/mg/.ssh/config line 362: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 12223
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
usage: sudo [-D level] -h | -K | -k | -V
usage: sudo -v [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-u user name|#uid]
usage: sudo -l[l] [-AknS] [-D level] [-g groupname|#gid] [-p prompt] [-U user name] [-u user name|#uid] [-g groupname|#gid] [command]
usage: sudo [-AbEHknPS] [-C fd] [-D level] [-g groupname|#gid] [-p prompt] [-u user name|#uid] [-g groupname|#gid] [VAR=value] [-i|-s] [<command>]
usage: sudo -e [-AknS] [-C fd] [-D level] [-g groupname|#gid] [-p prompt] [-u user name|#uid] file ...
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 1
Shared connection to 127.0.0.1 closed.

The usage message from sudo is suggestive.

affects_2.0 affects_2.1 affects_2.2 bug solaris core

Most helpful comment

Hi all, I hope this helps some people:

We encountered this problem, i.e. "Timeout waiting for privilege escalation prompt". We spent time debugging it, and this is what we found. Hopefully, folks smarter and more involved with ansible can use this as a bug report.

  1. target host is resolvable by a /etc/hosts entry on ansible system
  2. no network problems or delays
  3. both nodes are rhel7
  4. ssh to target host works using keys, no prompts or delays
  5. once on target host, sudo to root works as expected, no prompts or delays
  6. ansible-playbook version 2.2.0

With an EMPTY ansible.cfg file, both in /etc and ~/.ansible.cfg, we made a test case playbook that fails 100% of the time.

testcase.yml:

---

- name: testcase
  hosts: testhost
  become: true
  tasks:

  - name: testcase
    shell:
      cmd: sleep 30

We would run this test case with the command:
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml

The error we would receive at roughly 12s after starting the play was:
fatal: [testhost]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: \u001b[?25h\u001b[0G\u001b[K\u001b[?25h\u001b[0G\u001b[KBECOME-SUCCESS-beunzsdqhofnfczeeceajvbxfmzldrxn\r\n"}

Note that while the play said it failed, the sleep 30 command had indeed been started and clearly was running. The running "sleep 30" command, and the ansible connection to the target host, was terminated at the 12s mark when it determined it failed to get the priv escalation prompt.

Based on this thread, we speculated that if the command being run (i.e. sleep 30) did not finish BEFORE the timeout value, this error would occur. To test, if we re-ran the same with a higher timeout, and indeed we found it would succeed. This would possibly explain why some posters indicated success with a 30 second timeout - if their play completes within 30 seconds, no error. For those with commands taking >30 seconds, they'd still experience an error.

We could successfully run this testcase play using the following command:
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml -T35

Success output was:

TASK [testcase] ****************************************************************
changed: [testhost]

On multiple back to back trials, where the only thing modified was the -T timeout value: having a timeout < play time would always render this error, and having a timeout > play time would always render a success.

...all good... except this isn't what we understood the timeout value to do, which we felt was the timeout for an ssh connection to connect and be ready to start to run (connect and sudo), not the maximum time a play could take. Someone above speculated it was a 'buffering' issue - it seems possible, in that ansible does not seem to get the priv escalation prompt before the play starts running. But in essence, it seems that if a play doesn't complete within the timeout period then this error will occur.

In any event, we took it further. We were able to discover that if we set pipelining=true in ansible.cfg, then we did NOT need to adjust the timeout value, and the play would execute as expected. Example ansible.cfg

[ssh_connection]
pipelining=true

With pipelining turned on, and the identical above testcase, then even when we ran the playbook with a very short timeout (i.e. 5 seconds), then the play still completed as expected.
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml -T5

On multiple back-to-back trials where the timeout was much less than play time (5 seconds vs 30, which would have caused the error per our previous testing of timeout values), and where the only thing that changed between trials was the pipelining=true/false: having ssh pipelining=true would always render a success, and having ssh pipelining=false would always render this error.

Hopefully this helps some folks, if more info needed please reply.

Thanks,

All 123 comments

Let's try to unpack the quoting:

/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb'"'"''

This is /bin/sh executing

sudo -H -S -n -u root /bin/sh -c 'echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb'

which looks okay to me, but does not work for some reason -- it produces no output (and no error from sudo):

$ ssh -C -o ForwardAgent=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 sudo -H -S -n -u root sh -c 'echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb'
Warning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.

Huh. If I drop the sh -c, and quotes, then I get the desired output:

$ ssh -C -o ForwardAgent=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/home/mg/src/deployments/provisioning/.vagrant/machines/precise/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 sudo -H -S -n -u root echo BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikbWarning: Permanently added '[127.0.0.1]:2222' (ECDSA) to the list of known hosts.
BECOME-SUCCESS-kgkgdyonvybkjovhnpcbhsocdkkukikb
Connection to 127.0.0.1 closed.

I don't understand this.

Oh hey

$ ssh -C ... -tt 127.0.0.1 strace -f sh -c 'echo wat'
execve("/bin/sh", ["sh", "-c", "echo", "wat"], [/* 16 vars */]) = 0
...

And this is why it fails.

Adding one more level of quoting would make it work:

$ ssh -C ... -tt 127.0.0.1 "sh -c 'echo wat'"
wat

Never mind: ansible-playbook -vvv _lies to me_, and does not actually execute the commands it claims to be executing.

I tried to reproduce, and I see -vvv output containing

<localhost> SSH: EXEC ssh -C -q -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r -tt localhost /bin/sh -c 'sudo -H -S -n -u mg /bin/sh -c '"'"'echo BECOME-SUCCESS-jidcfzolsnlaytjnjowgjiesjkijtqrh; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /tmp/ansible-tmp-1448352781.72-65353427623045/command'"'"''

which also doesn't work if I execute it from the shell, but Ansible actually succeeds, and strace confirms that the "/bin/sh -c '...'" bit is passed as a single argument:

execve("/usr/bin/ssh", ["/usr/bin/ssh", "-C", "-q", "-o", "ControlMaster=auto", "-o", "ControlPersist=60s", "-o", "KbdInteractiveAuthentication=no", "-o", "PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey", "-o", "PasswordAuthentication=no", "-o", "ConnectTimeout=10", "-o", "ControlPath=/home/mg/.ansible/cp/ansible-ssh-%h-%p-%r", "-tt", "localhost", "/bin/sh -c 'sudo -H -S -n -u mg /bin/sh -c '\"'\"'echo BECOME-SUCCESS-hxrdsqwlxhfaqeaaukwemxnyglaizznx; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /tmp/ansible-tmp-1448353157.19-275587866328999/command'\"'\"''"], [/* 73 vars */])

Yeah, it "lies to you", but when I was redoing that code I chose to leave that part alone. It's not feeding the entire command as a single string to a shell, after all, but using execve (indirectly, via subprocess.Popen). The choices are to leave it as-is, or introduce another level of pipes.quote() just for display to make the command fully cut-and-paste-able. Given that the quoting is already both pretty horrific and broken (e.g. #12290, #13179), I opted not to wrap it further.

But, given that cmd is sent as a single argument, I am puzzled by your sudo failure (it works for me at the moment with devel).

I checked with pstree -aup in the VM while ansible was running.

I see Ansible doing successful git clones for foo.pov.lt, bar.pov.lt, and then I see it doing

sshd,1166 -D
  ├─sshd,4422
  │   └─sshd,4524,vagrant 
  │       └─bash,4525
  │           └─pstree,7332 -aup 1166 -l
  └─sshd,6708
      └─sshd,6810,vagrant 
          └─sh,6988 -c sudo -H -S -n -u root /bin/sh -c 'echo BECOME-SUCCESS-twujfdrxzucbaagtolwawxsskbxgnrmt; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/" > /dev/null 2>&1'
              └─sudo,6989,root -H -S -n -u root /bin/sh -c echo BECOME-SUCCESS-twujfdrxzucbaagtolwawxsskbxgnrmt; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/" > /dev/null 2>&1
                  └─sh,6990 -c echo BECOME-SUCCESS-twujfdrxzucbaagtolwawxsskbxgnrmt; LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/git; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/" > /dev/null 2>&1
                      └─python,6991 /home/vagrant/.ansible/tmp/ansible-tmp-1448355742.59-61543405843328/git
                          └─git,7062 clone --origin origin [email protected]:/git/ivija.pov.lt.git /var/www/ivija.pov.lt
                              ├─git,7098 index-pack --stdin --fix-thin --keep=fetch-pack 7062 on vagrant-ubuntu-precise-64
                              ├─ssh,7063 [email protected] git-upload-pack '/git/ivija.pov.lt.git'
                              └─{git},7097

And then comes the timeout.

Conclusion: the echo BECOME-SUCCESS ... message is buffered somewhere so if the module takes a longer time to run, ansible times out prematurely.

Oh, my. If I reorder the items in my task:

- git: [email protected]:/git/{{ item }}.git dest=/var/www/{{ item }}
  with_items:
    - baz.pov.lt
    - foo.pov.lt
    - bar.pov.lt
  tags: websites

then Ansible v2 succeeds!

Conclusion: the echo BECOME-SUCCESS ... message is buffered somewhere so if the module takes a longer time to run, ansible times out prematurely.

This is, um, wrong.

I looked at the source code:

                    raise AnsibleError('Timeout (%ds) waiting for privilege escalation prompt: %s' % (timeout, stdout))

stdout is the output received from the subprocess.

Somehow the state machine gets wedged and doesn't realize sudo was already successful.

I added some debugging prints and here's what's happening:

 --> state = 1
 --> stdout = 'BECOME-SUCCESS-iihpirenpoxwdzpoqpoktzmlirjtrozu\r\n'
 --> tmp_stdout = ''
 --> success_key = 'BECOME-SUCCESS-zcnvwvktilslzifcbygketvntxvhiqbt'

Debug code:

diff --git a/lib/ansible/plugins/connection/ssh.py b/lib/ansible/plugins/connection/ssh.py
index 8bbc031..776aaa9 100644
--- a/lib/ansible/plugins/connection/ssh.py
+++ b/lib/ansible/plugins/connection/ssh.py
@@ -414,6 +414,15 @@ class Connection(ConnectionBase):

             if not rfd:
                 if state <= states.index('awaiting_escalation'):
+                    import sys
+                    sys.stderr.write('\n\n'
+                                     ' --> state = %r\n'
+                                     ' --> stdout = %r\n'
+                                     ' --> tmp_stdout = %r\n'
+                                     ' --> success_key = %r\n'
+                                     '\n'
+                                     % (state, stdout, tmp_stdout, self._play_context.success_key))
+                    sys.stderr.flush()
                     # If the process has already exited, then it's not really a
                     # timeout; we'll let the normal error handling deal with it.
                     if p.poll() is not None:

git bisect blames 9a8e95bff3d01cd06f193ead91997e21dc137bdb

Steps to Reproduce

  • test.yml:
---
- hosts: localhost
  gather_facts: no
  tasks:
    - command: sleep {{item}}
      become: yes
      with_items:
        - 1
        - 15

Expected Output

$ ansible --version
ansible 1.9.4
  configured module search path = None
$ ansible-playbook -i localhost, test.yml 

PLAY [localhost] ************************************************************** 

TASK: [command sleep {{item}}] ************************************************ 
changed: [localhost] => (item=1)
changed: [localhost] => (item=15)

PLAY RECAP ******************************************************************** 
localhost                  : ok=1    changed=1    unreachable=0    failed=0   

Actual Output

$ ansible --version
ansible 2.0.0 (stable-2.0 90021104d5) last updated 2015/11/24 08:59:25 (GMT +300)
  lib/ansible/modules/core: (detached HEAD 273112c56d) last updated 2015/11/24 09:00:04 (GMT +300)
  lib/ansible/modules/extras: (detached HEAD e46e2e1d6f) last updated 2015/11/24 09:00:04 (GMT +300)
  config file = 
  configured module search path = Default w/o overrides
$ ansible-playbook -i localhost, test.yml 

PLAY ***************************************************************************

TASK [command] *****************************************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (12s) waiting for privilege escalation prompt: BECOME-SUCCESS-nxrqutnmapmtdleunwcwmxyetitrrlyv\r\n"}

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1   

+1 can confirm this issue

@stephanadler if you set _connected = False on line 63 of lib/ansible/plugins/connection/ssh.py, does that fix it?

@amenonsen Yes, this fixes the issue.

Hello

this happens also on 2.0.0.2:

$ ansible --version
ansible 2.0.0.2
config file = /Users/misko/.ansible.cfg
configured module search path = Default w/o overrides

TASK [copy ssh key(s)] ********************************************************* fatal: [myhostname.local]: FAILED! => {"failed": true, "msg": "ERROR! Timeout (12s) waiting for privilege escalation prompt: "}

Just happening with 2.0.0.2

is this fix planned for 2.0.1 or is there going to be an earlier patch to 2.0.0.x?

running 2.0.0.2 and experiencing the issue as well.

thx for the fix btw :cake:

This is affecting us as well and causing some major headaches.

Same for me here:

Ansible 2.0.0.2

+1 for ansible 2.0.0.2

Still happening to me with 2.0.0.2

@cho-is @zamotivator @cblakkan https://github.com/ansible/ansible/issues/13278#issuecomment-159254725 if you didn't already see it

Yes that fixes the problem, is there any schedule for when this fix will be released?

this did not fix for me (already in 2.0.0.2)

@michalmedvecky is this what lines 63-65 looks like in lib/ansible/plugins/connection/ssh.py?

_connected = False
def _connect(self):
    return self

@MrMMorris I have 2.0.0.2 but the line _connected = False is missing

Output for version:

andres@HALCON:~ → ansible --version
ansible 2.0.0.2
   config file = /etc/ansible/ansible.cfg
   configured module search path = Default w/o overrides

@sascha-andres yep, you are supposed to add _connected = False

so this should happen in 2.0.0.2 but not in 2.0.1 (reopened as it was reported as still an issue in 2.0.1)

cc @jimi-c @amenonsen

+1

On Mon, Feb 29, 2016 at 11:26 PM, Brian Coca [email protected]
wrote:

cc @jimi-c https://github.com/jimi-c @amenonsen
https://github.com/amenonsen


Reply to this email directly or view it on GitHub
https://github.com/ansible/ansible/issues/13278#issuecomment-190426855.

Just saw this happen again with 2.0.1.0

+1 for 2.0.1.0

Could someone please post verbose debug output from a failing run? ANSIBLE_DEBUG=y ansible-playbook -vvvvv ….

It's hard to reproduce, it happens sporadically :(

I suppose the problem with long ansible tasks.
For instance, I time-to-time receive this timeout on upload big >200 Mb archive to server, or on export docker image from tar.gz archive to docker

Disabling SSH multiplexing fixed this issue for me.

[ssh_connection]
ssh_args = -o ForwardAgent=yes -o ControlMaster=no -o StrictHostKeyChecking=no

I did a bit more debugging, and it turns out my connection to the remote machine had a 6% packet loss rate (spotty wifi). Putting it on the same network as the Ansible host eliminated this issue. So it was a network timeout rather than a "waiting for sudo" timeout -- but I'm not sure if Ansible can tell the difference.

I don't think it's a network issue because I was testing from the same network when I got that error. However, ssh multiplexing seems to cause me lot of troubles. For instance when a second ssh session try to create a control socket with name already existing in the ControlPath directory fails with timeout.
This usually happens when using with_items in tasks. Disabling ssh multiplexing seems to fix it. At least for me.

@vutoff I've disabled multiplexing and the bug still occurs.
ansible-2.0.1.0

Same priviledge escalation issue with Ansible 2.0.1.0 on Solaris 11.3 x86 servers.
While 2.0.0.2 works fine, become: yes at playbook level fails on 2.0.1.0...

---

- hosts: all
  gather_facts: no
  become: yes
  become_user: root
  become_method: su

  tasks:
    - shell: /bin/true
~/git/s11-test ▸ ansible-playbook -i inventory.ini test.yml --ask-become-pass
SUDO password:

PLAY ***************************************************************************

TASK [command] *****************************************************************
fatal: [example.com]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: \r\n\r"}

NO MORE HOSTS LEFT *************************************************************
  to retry, use: --limit @test.retry

PLAY RECAP *********************************************************************
example.com : ok=0    changed=0    unreachable=0    failed=1  

@passw0rd123 Could you please post verbose debug output from a failing run? ANSIBLE_DEBUG=y ansible-playbook -vvvvv ….?

@passw0rd123 Thank you! I'll go through it and see if I can figure out what's wrong. Meanwhile, if anyone else can reproduce the problem, please feel free to post another gist.

So I can see what the problem is (at least in your case):

 38332 1458312677.34175: _low_level_execute_command(): executing: /bin/sh -c 'su  root -c /bin/sh -c '"'"'echo BECOME-SUCCESS-rvajtyhgcqtodequifwsdrooxaxrgwzp; /bin/sh -c '"'"'"'"'"'"'"'"'LANG=de_DE.UTF-8 LC_ALL=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 /usr/bin/python /export/home/admin/.ansible/tmp/ansible-tmp-1458312677.07-145981842056304/command; rm -rf "/export/home/admin/.ansible/tmp/ansible-tmp-1458312677.07-145981842056304/" > /dev/null 2>&1'"'"'"'"'"'"'"'"''"'"''
…
 38332 1458312677.41075: stdout chunk (state=0):
>>>Password: <<<

 38332 1458312677.41093: become_prompt: (source=stdout, state=awaiting_prompt): 'Password: '
 38332 1458312677.41115: Sending become_pass in response to prompt
 38332 1458312677.41177: stdout chunk (state=1):
>>>
<<<

 38332 1458312677.53112: stdout chunk (state=1):
admin@host:~$ <<<

I can't imagine why doing "ssh host cmd" would result in a prompt. @bcoca has some explanation, but I didn't understand it, and I haven't been able to reproduce it by, e.g., opening a persistent interactive ssh connection and then executing commands separately and so on. (I don't even know if command restrictions in .ssh/authorized_keys can cause this.)

But that's why it's timing out for you. It doesn't see the BECOME-SUCCESS-xxx key it's expecting after escalation, just a prompt and nothing else.

Oh, the quoting is wrong:

/bin/sh -c 'su  root -c /bin/sh -c '"'"'echo BECOME-SUCCESS…

So we've seen that some su's will ignore the -c /bin/sh and try to execute whatever's specified to the second -c. I had never seen it dropping you into a shell, but the above command is clearly not right anyway, and we should fix that. But I thought we had applied a patch for this already? @bcoca, any thoughts?

Oh, maybe it's ignoring the _second_ -c and just executing /bin/sh. That would indeed explain a prompt.

I would love to hear from any of the other people who have been having the same problem, so far I have only one complete debug log for this (thanks again to @passw0rd123).

@amenonsen You're welcome! If you need further information I can look into it on monday.
I can reproduce this su issue with 2.0.1.0 on an Solaris 11.2 (instead of 11.3) Vagrant Box (on VirtualBox) after setting a password for root, perhaps this helps?

https://atlas.hashicorp.com/dariusjs/boxes/solaris_11_2

Reproduced with ansible 2.0.1.0 on FreeBSD 11.0-CURRENT against hosts running FreeBSD.

@xmj debug output, please? ANSIBLE_DEBUG=y ansible-playbook -vvvvv …

@amenonsen I saw this error yesterday with Ansible 2.0.0.2 and 2.0.1.0, both with become_method = su AND become_method = sudo on hosts running FreeBSD 10.2-RELEASE and 10.3-BETA3 ---- but managed to overwrite the logfiles (no, I didn't try all possible combinations).

Today I haven't been able to reproduce it again -- seems like something that occurs with sudo and a sudoers file without NOPASSWD.

I got the same with ansible 2.0.1.0 while trying to provision a clean Debian 8.3 on EC2. Obviously that I configured the ansible_user=admin but I was not expecting to have something else to to in order pawn the machine.

+1 for ansible 2.1.0 (devel 60c943997b) last updated 2016/03/18
setting ssh_args=(...)ControlMaster=no(...) in [ssh_connection] solved the issue for me as proposed by @vutoff

@amenonsen This is what I got, is 100% reproducible.
https://gist.github.com/grenas/bc758aa38a11df8c6302e3c113406e2f

@amenonsen I reproduced the issue with Ansible 2.0.1.0 (pip install) on Ubuntu 14.04 while running against a freshly started Ubuntu 14.04 EC2 server.

Once the server started, I ran:
ANSIBLE_DEBUG=y ansible <EC2_ID> -s -m shell -a '/bin/true' -vvvvv

The first run of this command fails with the timeout error but if I run the exact same command again, it works fine.

Any idea why it works once in a while?

Here is the debug output of these two steps I mentioned above:

+1 also happens here for freshly started Ubuntu 14.04 EC2.

The same:
+1 also happens here for freshly started Ubuntu 14.04 EC2

+1 happens randomly here on Ubuntu 14.04 EC2 instances on Ansible 2.0.2.0 - can be freshly started, or after a dozen tasks.

+1 Ditto CentOS 7.x
Linux nat-10-0-39-170 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

For people experiencing this problem, please try each of the below changes before posting:

  1. https://github.com/ansible/ansible/issues/13278#issuecomment-187979124
  2. https://github.com/ansible/ansible/issues/13278#issuecomment-194349084

@jimi-c @amenonsen is there not enough information here for a fix? Seems like a pretty widespread and impacting problem

Neither of the two proposed solutions worked for me :(

Updates: it seems to be failing on the tasks/plays with become: yes set. We are running with Ansible 2.0.1.0 on Ubuntu 14.04 LTS

Currently I am setting transport = paramiko in ansible.cfg as a workaround. until this bug is fixed

@MrMMorris I am experiencing this bug with 2.0.0.2 as well. I've tried both suggested workarounds (next up I will try paramiko as suggested by @JohanTan ) but it is still a common headache. I've noticed that when there are others running playbooks on the same host (in our case we have a jumphost where everyone runs ansible-playbook from) it gets _much_ worse and constantly repeatable.

With transport=paramiko we do lose the ability to --diff which is kinda annoying. Would hope this bug gets fixed soon.

Is _always_ reproducible for me when _first_ time ssh accessing newly launched ubuntu 14.04 EC2 instances. Only getting stuck for the first attempt of ssh access, all the subsequent attempts succeed connecting with no error.

11:05:58.294 TASK [setup] *******************************************************************
11:05:58.487 <ec2-xx-xx-xx-xx.ap-northeast-1.compute.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
11:05:58.489 <ec2-xx-xx-xx-xx.ap-northeast-1.compute.amazonaws.com> SSH: EXEC ssh -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'IdentityFile="/very/secret/path/.ssh/id_rsa_zzzz"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -o ControlPath=/very/secret/path/.ansible/cp/%h-%p ec2-xx-xx-xx-xx.ap-northeast-1.compute.amazonaws.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-asdfasdfasdf; /bin/sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"''"'"''

11:05:58.550 <ec2-yy-yy-yy-yy.ap-northeast-1.compute.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
11:05:58.554 <ec2-yy-yy-yy-yy.ap-northeast-1.compute.amazonaws.com> SSH: EXEC ssh -C -vvv -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'IdentityFile="/very/secret/path/.ssh/id_rsa_zzzz"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -o ControlPath=/very/secret/path/.ansible/cp/%h-%p ec2-yy-yy-yy-yy.ap-northeast-1.compute.amazonaws.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-lksjdhflasdf; /bin/sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"''"'"''

Update: this issue seems to persist in Ansible 2.1.0.0 according to my last test.

I get this error on ansible 2.1.0.0 too.

24113 1464705052.69173: _low_level_execute_command(): executing: /bin/sh -c 'su root -c '"'"'/bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-zpbisiksiuvwdpwkoahuhghbuwyukulw; /bin/sh -c "/bin/echo fractalcells: { url: http://pkg.fractalcells.com/packages/FreeBSD:10:amd64/, enabled: true, priority: 10 } > /etc/pkg/fractalcells.conf"'"'"'"'"'"'"'"'"''"'"' && sleep 0'
24113 1464705052.69947: Initial state: awaiting_prompt:
24113 1464705064.70193: done running TaskExecutor() for itgitlab/TASK: Add Fractalcells config
24113 1464705064.70409: sending task result
24113 1464705064.70541: done sending task result
24113 1464705064.70589: WORKER PROCESS EXITING
24112 1464705064.70802: worker 0 has data to read
24112 1464705064.71035: got a result from worker 0:
24112 1464705064.71129: sending result: [u'host_task_failed', u'']
24112 1464705064.71322: done sending result
24105 1464705064.71529: got result from result worker: [u'host_task_failed', u'']
24105 1464705064.71583: marking itgitlab as failed
24105 1464705064.71644: marking host itgitlab failed, current state: HOST STATE: block=2, task=1, rescue=0, always=0, role=None, run_state=ITERATING_TASKS, fail_state=FAILED_NONE, pending_setup=False, tasks child state? None, rescue child state? None, always child state? None, did start at task? False
24105 1464705064.71779: ^ failed state is now: HOST STATE: block=2, task=1, rescue=0, always=0, role=None, run_state=ITERATING_COMPLETE, fail_state=FAILED_TASKS, pending_setup=False, tasks child state? None, rescue child state? None, always child state? None, did start at task? False
fatal: [itgitlab]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: "}

confirmed reproducible on 2.1.0.0 at random, regardless of whether it is a newly spun up AWS instance.

Is there any progress on this? I started seeing this randomly with modules like copy and other core modules, become: true seems to be what they all have in common.

I too am encountering these with ec2 instances on 2.1.0.0. So far seems to be only new instances. If that is not the case I will post back.

Nope, it's not just newly created instances. Its for any instance. Typically if fails on the setup task.

Something I just noticed. The error only seems to occur if I use --tags.

+1 happens randomly using 2.1.0.0 on EC2 instances and I am not using --tags

Happens for me with ansible local on OS X, with the npm module and a list of items.

All of these "me too" responses are not very helpful. From this point on, please include:

  • the Ansible version
  • the number of forks being used
  • -vvvvv output
  • the load on your system at the time (which will be impacted by the # of forks)
  • any modified SSH settings which might involve the timeout setting

For me:

  • Both Ansible 2.0.1.0 and 2.1.0.0
  • 50, but single host
  • sorry not currently available, will try to capture
  • minimal
  • Only have in ansible.cfg:
[ssh_connection]
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
pipelining = True

Hi all,
Before you declare your case to be an ansible problem, make sure that you can actually ssh into an instance at the same time ansible's task is failing. AWS instances can get temporarily unresponsive due to high volume of EBS disk operations, among other things, and that can affect ansible's ability to reach the instance.

There are several things you can do prevent the playbook from failing:

  1. Increase the timeout value. The default is 10 seconds, and ansible adds 2 more, to the total of 12 sec. In ansible.cfg just add a line with a more generous timeout, e.g. "timeout = 600".
  2. Analyze the tasks that happen prior to the one that's timing out. If they are disk intensive, chances are this is what's causing the problem. There are ways to minimize the problem if not eliminate it completely. If you cannot think of a way to improve the performance, or you are just looking for a quick fix, adding a wait prior to the failing task will get your playbook through without a failure.

Hint: When analyzing which tasks are slowing down, I find ansible's built-in profiler to be a great help. Add "callback_whitelist = profile_tasks" to ansible.cfg to switch it on.

I'll add more details, but I can run the same plays on 1.9 and never see this problem.

I am using ansible 2.0.2.0 and its giving me the same error in case of restarting a service as well. Its happening when I am trying to restart nginx on my remote machine with service module.

@oleyka I added timeout = 60 but I still got "Timeout (12s) waiting for privilege escalation prompt: " - this implies this isn't the right timeout you're talking about. As for things being disk-intensive - well for me it's always the first task running on a freshly booted EC2 instance that fails with this message.

@adamchainz You might want to insert a wait_for ssh task prior to your first task on the newly launched instance, if you do not have it already. Something along the lines of:

- name: wait for ssh to come up
  wait_for:
    port: 22
    host: "{{ item.private_ip }}"
    delay: 20
    timeout: 600
  with_items: "{{ new_instances.instances }}"

where new_instances is an output of your ec2 launch task.

I do have such a task 😢

Does anybody have any hypotheses of why this is happening? Any specific lines in the source that we could instrument to dig deeper?

Can anyone reproduce this while collecting strace? I.e. strace -ttf
would be interesting to see if any signals occur right before the error.

Going to try to reproduce, but it's not very frequently occurring on our systems.

@isegal

Does anybody have any hypotheses of why this is happening?

In the very similar issue https://github.com/ansible/ansible/issues/14426#issuecomment-183565130 @bcoca mentions "the rewritten 'prompt detection'", that could be something, as it seems people didn't have the problem with ansible 1.9.x

(sorry, long post ahead)

We also have this problem. This is the playbook that encounters it:

---
- hosts: backupservers
  vars:
    transfer_snapshot: storage/backup/mssql@transfer
    destination_dataset: mssql
    timeout_for_transfer: 21600
  become: true
  tasks:
  - name: List all USB-disks
    shell: "zpool import | awk '/pool:/ { print $2 }' | head -n 1"
    register: usb_pool

  - name: Ensure USB pool is imported
    command: "zpool import {{ usb_pool.stdout }}"

  - name: Cleanup old transfer snapshot
    command: "zfs destroy -r {{ transfer_snapshot }}"
    ignore_errors: yes

  - name: Take transfer snapshot
    command: "zfs snapshot -r {{ transfer_snapshot }}"

  - name: Remove USB datasets
    command: "zfs destroy -r {{ usb_pool.stdout }}"

  - name: Start sending MSSQL snapshots to USB disk
    shell: "zfs send -R {{ transfer_snapshot }} | mbuffer -q -m 128M | zfs recv -u -F {{ usb_pool.stdout }}/{{ destination_dataset }}"
    async: "{{ timeout_for_transfer }}"
    poll: 0
    register: zfs_sender

  - name: Check if zfs sender is done
    async_status: jid={{ zfs_sender.ansible_job_id }}
    register: job_result
    until: job_result.finished
    retries: "{{ timeout_for_transfer // 60 }}"
    delay: 60

  - name: List all backup datasets
    shell: "zfs list -H -t filesystem -r {{ usb_pool.stdout }} | awk '{ print $1 }'"
    register: backup_datasets

  - name: Ensure the backup datasets aren't mounted
    shell: "zfs set mountpoint=none '{{ item }}'"
    with_items:
      - "{{ backup_datasets.stdout_lines }}"

  - name: Export USB pool
    command: "zpool export {{ usb_pool.stdout }}"

  - name: Remove transfer snapshot
    command: "zfs destroy -r {{ transfer_snapshot }}"

And when it failed it looked like this:

$ ansible-playbook playbooks/backup_copy_to_usb_disk.yml
SUDO password:

PLAY [backupservers] ***********************************************************

TASK [setup] *******************************************************************
ok: [hodor.live.lkp.primelabs.se]

TASK [List all USB-disks] ******************************************************
changed: [hodor.live.lkp.primelabs.se]

TASK [Ensure USB pool is imported] *********************************************
changed: [hodor.live.lkp.primelabs.se]

TASK [Cleanup old transfer snapshot] *******************************************
fatal: [hodor.live.lkp.primelabs.se]: FAILED! => {"changed": true, "cmd": ["zfs", "destroy", "-r", "storage/backup/mssql@transfer"], "delta": "0:00:00.206180", "end": "2016-05-10 16:08:41.515365", "failed": true, "rc": 1, "start": "2016-05-10 16:08:41.309185", "stderr": "could not find any snapshots to destroy; check snapshot names.", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

TASK [Take transfer snapshot] **************************************************
changed: [hodor.live.lkp.primelabs.se]

TASK [Remove USB datasets] *****************************************************
changed: [hodor.live.lkp.primelabs.se]

TASK [Start sending MSSQL snapshots to USB disk] *******************************
ok: [hodor.live.lkp.primelabs.se]

TASK [Check if zfs sender is done] *********************************************
FAILED - RETRYING: TASK: Check if zfs sender is done (359 retries left).
FAILED - RETRYING: TASK: Check if zfs sender is done (358 retries left).
fatal: [hodor.live.lkp.primelabs.se]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: "}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
hodor.live.lkp.primelabs.se : ok=7    changed=4    unreachable=0    failed=1

We don't run this playbook that often, like once in two weeks, and it has failed every time since beginning of May.

I think it's worth sharing what a colleague of mine did when trying to reproduce the problem:

I've created a "dummy" playbook" to try and debug this, I can reproduce the error by simulating a broken network (with the Network Link Conditioner) in the middle of the playbook.

The "dummy" playbook:


---
  - hosts: elasticsearch-marvel
    become: yes
    tasks:
      - name: Simulating slow task
        shell: echo "yellow"
        register: result
        until: result.stdout.find("green") != -1
        retries: 600
        delay: 10

image

Some different error messages depending on we're using become or not:

Ansible 2.0.2.0

With become

$ ansible-playbook playbooks/slow-playbook.yml
SUDO password:

PLAY [elasticsearch-marvel] ****************************************************

TASK [setup] *******************************************************************
ok: [marvel02.live.lkp.primelabs.se]
ok: [marvel01.live.lkp.primelabs.se]

TASK [Simulating slow task] ****************************************************
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
FAILED - RETRYING: TASK: Simulating slow task (598 retries left).
FAILED - RETRYING: TASK: Simulating slow task (598 retries left).
fatal: [marvel01.live.lkp.primelabs.se]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiti
ng for privilege escalation prompt: "}
fatal: [marvel02.live.lkp.primelabs.se]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiti
ng for privilege escalation prompt: "}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
marvel01.live.lkp.primelabs.se : ok=1    changed=0    unreachable=0    failed=1
marvel02.live.lkp.primelabs.se : ok=1    changed=0    unreachable=0    failed=1

With sudo

$ ansible-playbook playbooks/slow-playbook.yml
SUDO password:
[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and make sure
become_method is 'sudo' (default).
This feature will be removed in a future release. Deprecation
 warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

PLAY [elasticsearch-marvel] ****************************************************

TASK [setup] *******************************************************************
ok: [marvel01.live.lkp.primelabs.se]
ok: [marvel02.live.lkp.primelabs.se]

TASK [Simulating slow task] ****************************************************
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
fatal: [marvel01.live.lkp.primelabs.se]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation > prompt: "}
fatal: [marvel02.live.lkp.primelabs.se]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation > prompt: "}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
marvel01.live.lkp.primelabs.se : ok=1    changed=0    unreachable=0    failed=1
marvel02.live.lkp.primelabs.se : ok=1    changed=0    unreachable=0    failed=1

Without become

$ ansible-playbook playbooks/slow-playbook.yml
SUDO password:

PLAY [elasticsearch-marvel] ****************************************************

TASK [setup] *******************************************************************
ok: [marvel02.live.lkp.primelabs.se]
ok: [marvel01.live.lkp.primelabs.se]

TASK [Simulating slow task] ****************************************************
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
FAILED - RETRYING: TASK: Simulating slow task (599 retries left).
fatal: [marvel02.live.lkp.primelabs.se]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", > "unreachable": true}
fatal: [marvel01.live.lkp.primelabs.se]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", > "unreachable": true}

PLAY RECAP *********************************************************************
marvel01.live.lkp.primelabs.se : ok=1    changed=0    unreachable=1    failed=0
marvel02.live.lkp.primelabs.se : ok=1    changed=0    unreachable=1    failed=0

This is the ansible.cfg used for the the real run above, and the test playbook:

[defaults]
hostfile=hosts
log_path=logs/ansible.log
callback_plugins=callback_plugins
callback_whitelist=log_plays_per_host
ask_sudo_pass=True
retry_files_enabled=False
vault_password_file=vault_password.sh
forks=25

[privilege_escalation]
become_ask_pass=True

I'm going to run our problematic playbook with ANSIBLE_DEBUG=y, -vvvvv, the timeout bumped to 600 as suggested by @oleyka, and run with profile_tasks. Will report back.

With the settings above, the playbook ran without any problems. It took 2h 16m 32s to run it.

...

PLAY RECAP *********************************************************************
hodor.live.lkp.primelabs.se : ok=12   changed=9    unreachable=0    failed=0

Monday 04 July 2016  18:56:54 +0200 (0:00:00.581)       2:16:28.898 ***********
===============================================================================
Check if zfs sender is done ------------------------------------------ 8149.41s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:32 -------
Remove USB datasets ---------------------------------------------------- 24.60s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:23 -------
Ensure USB pool is imported --------------------------------------------- 3.64s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:13 -------
setup ------------------------------------------------------------------- 2.74s
None --------------------------------------------------------------------------
List all USB-disks ------------------------------------------------------ 2.18s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:9 --------
Ensure the backup datasets aren't mounted ------------------------------- 1.85s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:43 -------
Start sending MSSQL snapshots to USB disk ------------------------------- 1.30s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:26 -------
Take transfer snapshot -------------------------------------------------- 0.83s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:20 -------
Export USB pool --------------------------------------------------------- 0.75s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:48 -------
Remove transfer snapshot ------------------------------------------------ 0.58s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:51 -------
List all backup datasets ------------------------------------------------ 0.52s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:39 -------
Cleanup old transfer snapshot ------------------------------------------- 0.46s
/Users/dentarg/twingly/ansible/playbooks/backup_copy_to_usb_disk.yml:16 -------
 23955 1467651414.69994: RUNNING CLEANUP

Disregard my last comment... I was thinking of the sshd config rather than the client.

For the client side, the relevant setting might be ServerAliveInterval, which is zero by default. It might help setting this to non-zero value (and adjusting ServerAliveCountMax too) to come up with around 60s before the connection is terminated.

Would anyone care to try this?

I ran into the same issue with Ansible 2.1 on RHEL 6.7 executing a playbook on RHEL 7.2. The playbook was failing on the copy module.

I resolved the issue by including -T60 in the ansible-playbook command.

@oleyka I made a mistake when I implemented timeout = 60 in our repo and updated the wrong ansible.cfg (we have one to apply to Vagrant boxes only). I added it to the ansible.cfg that affected our EC2 instances and we haven't seen the error since, thanks.

@amenonsen with Ansible 2.1.1.0 and turning pipelining = False the Problems with privilege escalation (su) on Solaris 11.3 x86 seem to be gone. I have to look deeper and verify if the setup I mentioned in the latest post still persists...

I'm not sure if any of these solutions are any good to me, I have the following task which is ran through ansible local on my Mac (10.11.5).

- name: Update brew daily
  become: yes
  cron: name="brew autoupdate" special_time="daily"
        user="{{ansible_user_id}}" job="/usr/local/bin/brew update"

And I get the following error every single time

TASK [dev : Update brew daily] *************************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "timeout waiting for privilege escalation password prompt:\n\nWARNING: Improper use of the sudo command could lead to data loss\nor the deletion of important system files. Please double-check your\ntyping when using sudo. Type \"man sudo\" for more information.\n\nTo proceed, enter your password, or type Ctrl-C to abort.\n\n[sudo via ansible, key=kocvgfumdmtfjadtjmjelblnyhyuzdrn] password: "}

The task should take milliseconds, so I'm not sure what is timing out, and why.

I'm running ansible with ansible-playbook -i "localhost," -c local --ask-become-pass playbook.yml

@SteveEdson your issue is unrelated as there is no timeout problem, open a new issue. It seems to be that your sudo is expecting input.

@adamchainz is this not the same issue? My error "timeout waiting for privilege escalation password prompt" is the same as this issue title? (With the exception of "password")

I think it's different because it has no '(12s)', so it's coming from a different timeout. Also no one else here is reporting the 'waiting for input' thing

Experiencing random timeouts too, debug info follows. The same ansible commands and playbooks work perfectly against a Sol11 host on an earlier version, works fine without issue, for example target Version: 0.5.11 (Oracle Solaris 11.3.1.5.0) a-OK.

Ansible verison 2.1.1.0
Control host Centos 7.1
Target host Solaris 11.3 x86 Version: 0.5.11 (Oracle Solaris 11.3.10.5.0)

https://gist.github.com/asil75/1eabfa921790d5825f1d9c9c26fd27c8

annsible.cfg:

[defaults]
remote_tmp = /tmp/.ansible/tmp
timeout = 30
jinja2_extensions = jinja2.ext.do,jinja2.ext.i18n,jinja2.ext.loopcontrols
[privilege_escalation]
[paramiko_connection]
[ssh_connection]
[accelerate]
[selinux]
[colors]

hth.

Ansible 2.0.0.2 on my Ubuntu 16 64 bits still getting this error. But on my case I can apply my playbook with no problems on my Vagrant image, just get this error when I try to run it against my EC2 instance.

ansible -i my_inventory my_aws_server -m ping works :ok_hand: :
ansible-playbook -i my_inventory -l aws_server playbook.yml fail :hankey:
ansible-playbook -i .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory playbook.yml works :ok_hand:

If I use -vvvv, copy the exec ssh command/parameters and run it manually it connects to my ec2 instance without problems.

Tried adding ssh_args = -o ForwardAgent=yes -o ControlMaster=no -o StrictHostKeyChecking=no to /etc/ansible/ansible.cfg without success;

Tried adding transport = paramiko to /etc/ansible/ansible.cfg` without success;

MY SOLUTION

Added timeout=30 to /etc/ansible/ansible.cfg. This is not a "task" timeout as I do have tasks that take much longer than that updating linux apt, installing python3, etc.

The problem is not necessary in ansible, it may be in misconfigured /etc/hosts or other DNS issue.

Please check two things:

  1. Ensure you can login to host fast enough by ssh. Usually it should take less then one second.
  2. Ensure you can run sudo fast enough. Usually it is much less then one second.

Big ssh delays usually caused by incomplete DNS configuration. The same with sudo delays.

Common problem with slow sudo is incomplete /etc/hosts. It should be something like this:

127.0.0.1 localhost
127.0.1.1 example.com

(Openstack users: please take a look at 'manage_etc_hosts' option. It will solve this issue)

To give a heads up:
My environment:

ansible 2.1.1.0
  config file = /foo/ansible.cfg
  configured module search path = ['/usr/share/ansible']
Python 2.7.12+
OS: Kali rolling

I ran a playbook containing some roles against a local VM. Without a conscious change suddenly I got that timeout error mentioned above. I first tried a solution @vutoff mentioned in this comment, which seems to work for me. Currently I am following this comment from @vmenezes.

I will edit this comment if I gain new information.
Thanks to everyone helping here!

Thanks @vmenezes, adding timeout=30 to /etc/ansible/ansible.cfg made the trick for me. I'm using ansible 2.1.1.0 in a MAC + CentOS-Vagrant env

Hi all, I hope this helps some people:

We encountered this problem, i.e. "Timeout waiting for privilege escalation prompt". We spent time debugging it, and this is what we found. Hopefully, folks smarter and more involved with ansible can use this as a bug report.

  1. target host is resolvable by a /etc/hosts entry on ansible system
  2. no network problems or delays
  3. both nodes are rhel7
  4. ssh to target host works using keys, no prompts or delays
  5. once on target host, sudo to root works as expected, no prompts or delays
  6. ansible-playbook version 2.2.0

With an EMPTY ansible.cfg file, both in /etc and ~/.ansible.cfg, we made a test case playbook that fails 100% of the time.

testcase.yml:

---

- name: testcase
  hosts: testhost
  become: true
  tasks:

  - name: testcase
    shell:
      cmd: sleep 30

We would run this test case with the command:
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml

The error we would receive at roughly 12s after starting the play was:
fatal: [testhost]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: \u001b[?25h\u001b[0G\u001b[K\u001b[?25h\u001b[0G\u001b[KBECOME-SUCCESS-beunzsdqhofnfczeeceajvbxfmzldrxn\r\n"}

Note that while the play said it failed, the sleep 30 command had indeed been started and clearly was running. The running "sleep 30" command, and the ansible connection to the target host, was terminated at the 12s mark when it determined it failed to get the priv escalation prompt.

Based on this thread, we speculated that if the command being run (i.e. sleep 30) did not finish BEFORE the timeout value, this error would occur. To test, if we re-ran the same with a higher timeout, and indeed we found it would succeed. This would possibly explain why some posters indicated success with a 30 second timeout - if their play completes within 30 seconds, no error. For those with commands taking >30 seconds, they'd still experience an error.

We could successfully run this testcase play using the following command:
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml -T35

Success output was:

TASK [testcase] ****************************************************************
changed: [testhost]

On multiple back to back trials, where the only thing modified was the -T timeout value: having a timeout < play time would always render this error, and having a timeout > play time would always render a success.

...all good... except this isn't what we understood the timeout value to do, which we felt was the timeout for an ssh connection to connect and be ready to start to run (connect and sudo), not the maximum time a play could take. Someone above speculated it was a 'buffering' issue - it seems possible, in that ansible does not seem to get the priv escalation prompt before the play starts running. But in essence, it seems that if a play doesn't complete within the timeout period then this error will occur.

In any event, we took it further. We were able to discover that if we set pipelining=true in ansible.cfg, then we did NOT need to adjust the timeout value, and the play would execute as expected. Example ansible.cfg

[ssh_connection]
pipelining=true

With pipelining turned on, and the identical above testcase, then even when we ran the playbook with a very short timeout (i.e. 5 seconds), then the play still completed as expected.
/usr/bin/ansible-playbook -i ./testhost ./testcase.yml -T5

On multiple back-to-back trials where the timeout was much less than play time (5 seconds vs 30, which would have caused the error per our previous testing of timeout values), and where the only thing that changed between trials was the pipelining=true/false: having ssh pipelining=true would always render a success, and having ssh pipelining=false would always render this error.

Hopefully this helps some folks, if more info needed please reply.

Thanks,

Hi All,
i am too facing this issue with ansible version : 2.1.1.0
i tried all these workarounds mentioned in this post. but no luck ...
please suggest

I think I've run into this problem as well. The task that hangs in my playbook looks like this:

- name: deploy static config files and scripts
  copy: src=rootdir/ dest=/

It should copy a few config files recursively - mostly into /etc and subfolders.
I also get the timeout error mentioned above.
How can I check if this is the same problem? What can I do to circumvent it?

also happening here. timeout=30 in ansible.cfg fix it but with extreme slowness (1h36 to run vs 15min before bug appears)
a single host target at digitalocean

$ ansible --version
ansible 2.2.0.0
  config file = /opt/tmp/vagrant/homelab/ansible.cfg
  configured module search path = Default w/o overrides
$ egrep -v '(^#|^$)' ansible.cfg 
[defaults] 
log_path=ansible.log
roles_path = ./
transport = ssh
forks=5
callback_plugins = callback_plugins/
timeout = 30
[ssh_connection]
ssh_args = -o ForwardAgent=yes
pipelining=True
scp_if_ssh=True
$ ansible-playbook -i inventory --limit target playbook.yml -vvvvv
[...]
TASK [sketchy : git clone sketchy] *********************************************
task path: /home/user/Documents/script/homelab/roles/sketchy/tasks/main.yml:35
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/core/source_control/git.py
<10.5.1.29> ESTABLISH SSH CONNECTION FOR USER: root
<10.5.1.29> SSH: ansible.cfg set ssh_args: (-o)(ForwardAgent=yes)
<10.5.1.29> SSH: ANSIBLE_PRIVATE_KEY_FILE/private_key_file/ansible_ssh_private_key_file set: (-o)(IdentityFile="/home/user/.ssh/keys/mysshkey")
<10.5.1.29> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-key
ex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<10.5.1.29> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
<10.5.1.29> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<10.5.1.29> SSH: PlayContext set ssh_common_args: ()
<10.5.1.29> SSH: PlayContext set ssh_extra_args: ()
<10.5.1.29> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o 'IdentityFile="/home/user/.ssh/keys/mysshkey"' -o KbdInteractiveAuthentication=no -o P
referredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 10.5.1.29 '/bin/s
h -c '"'"'sudo -H -S -n -u _sketchy /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-rvizprhpalwltpdgmkyrsznzvhtcvmwg; /usr/bin/python'"'"'"'"'"'"'"'"' && sle
ep 0'"'"''
fatal: [target]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (12s) waiting for privilege escalation prompt: "
}
...ignoring

TASK [sketchy : install pip virtualenv] ****************************************
task path: /home/user/Documents/script/homelab/roles/sketchy/tasks/main.yml:42
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/core/packaging/language/pip.py
<10.5.1.29> ESTABLISH SSH CONNECTION FOR USER: root
<10.5.1.29> SSH: ansible.cfg set ssh_args: (-o)(ForwardAgent=yes)
<10.5.1.29> SSH: ANSIBLE_PRIVATE_KEY_FILE/private_key_file/ansible_ssh_private_key_file set: (-o)(IdentityFile="/home/user/.ssh/keys/mysshkey")
<10.5.1.29> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-key
ex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<10.5.1.29> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
<10.5.1.29> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<10.5.1.29> SSH: PlayContext set ssh_common_args: ()
<10.5.1.29> SSH: PlayContext set ssh_extra_args: ()
<10.5.1.29> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o 'IdentityFile="/home/user/.ssh/keys/mysshkey"' -o KbdInteractiveAuthentication=no -o P
referredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 10.5.1.29 '/bin/s
h -c '"'"'/usr/bin/python && sleep 0'"'"''
ok: [target] => {
    "changed": false, 
    "cmd": "/usr/bin/pip2 install virtualenv", 
    "invocation": {
        "module_args": {
            "chdir": null, 
            "editable": true, 
            "executable": null, 
            "extra_args": null, 
            "name": [
                "virtualenv"
            ], 
            "requirements": null, 
            "state": "present", 
            "umask": null, 
            "use_mirrors": true, 
            "version": null, 
            "virtualenv": null, 
            "virtualenv_command": "virtualenv", 
            "virtualenv_python": null, 
            "virtualenv_site_packages": false
        }, 
        "module_name": "pip"
    }, 
    "name": [
        "virtualenv"
    ], 
    "requirements": null, 
    "state": "present", 
    "stderr": "You are using pip version 8.1.1, however version 9.0.1 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n", 
    "stdout": "Requirement already satisfied (use --upgrade to upgrade): virtualenv in /usr/local/lib/python2.7/dist-packages\n", 
    "stdout_lines": [
        "Requirement already satisfied (use --upgrade to upgrade): virtualenv in /usr/local/lib/python2.7/dist-packages"
    ], 
    "version": null, 
    "virtualenv": null
}

TASK [sketchy : install sketchy pip dependencies inside virtualenv] ************
task path: /home/user/Documents/script/homelab/roles/sketchy/tasks/main.yml:44
Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/core/packaging/language/pip.py
<10.5.1.29> ESTABLISH SSH CONNECTION FOR USER: root
<10.5.1.29> SSH: ansible.cfg set ssh_args: (-o)(ForwardAgent=yes)
<10.5.1.29> SSH: ANSIBLE_PRIVATE_KEY_FILE/private_key_file/ansible_ssh_private_key_file set: (-o)(IdentityFile="/home/user/.ssh/keys/mysshkey")
<10.5.1.29> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<10.5.1.29> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=root)
<10.5.1.29> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<10.5.1.29> SSH: PlayContext set ssh_common_args: ()
<10.5.1.29> SSH: PlayContext set ssh_extra_args: ()
<10.5.1.29> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o 'IdentityFile="/home/user/.ssh/keys/mysshkey"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 10.5.1.29 '/bin/sh -c '"'"'sudo -H -S -n -u _sketchy /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-gashrpsdazspmuutfkzlacolenowlxkz; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
fatal: [target]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (12s) waiting for privilege escalation prompt: "
}

target$ sar
[about 70% idle, 10-15% system, 10-15% user]

Also get this from macos 10.11 orchestrator to guest 10.12 (same LAN)
timeout=30 workaround ok. duration difference was more acceptable. about ~2min vs ~6min

$ ansible --version
ansible 2.2.0.0
  config file = /Users/julien/script/homelab/ansible.cfg
  configured module search path = Default w/o overrides
[same ansible.cfg as above]
$ ansible-playbook -i inventory --limit air mac.yml -vvvv
[...]
TASK [harden-darwin : add application to allow incoming] ***********************
task path: /Users/myuser/roles/harden-darwin/tasks/firewall.yml:24
Using module file /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/modules/core/commands/command.py
<10.0.1.7> ESTABLISH SSH CONNECTION FOR USER: deploy
<10.0.1.7> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o 'IdentityFile="/Users/myuser/.ssh/keys/key-201612"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=deploy -o ConnectTimeout=10 10.0.1.7 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-jxassxqmjklqyzqlthojhnqljcxexifg; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
fatal: [air]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (12s) waiting for privilege escalation prompt: "
}
UNREACHABLE! => {
    "changed": false, 
    "msg": "SSH Error: data could not be sent to the remote host. Make sure this host can be reached over ssh", 
    "unreachable": true
}

works the 2nd try

$ ansible --version
ansible 2.2.0.0

@jamesongithub that's a different error than the one being discussed here.

Sorry, I was getting the privilege escalation error also. On OSX.

I received the same error on MacOSX El Capitan and Yosemite but not on macOS Sierra after migrating from Ansible 1.9.1 to 2.2.0. After a whole day of troubleshooting I finally managed to pin down the culprit to be the sudo_flags variable's value which in our config was -i. I couldn't understand why, but when invoking ssh <user>@<host> 'sudo -i -u builder /bin/sh -c '"'"'echo a; echo b; echo c; echo BECOME-SUCCESS-ykvssengiokabkumhzrlbnxinmsjxfpz; '"'"' && sleep 0' the first line was always being omitted from the output. Ansible's first line is the Become Success line and if it is missing the command times out. In the end we replaced the option with -H which worked fine in our case. The difference is that it stopped executing the profile scripts of the user as it did with -i.

If you change the target machine host name at;

Hostname file

/etc/hostname
you must also add the new name to;

Local dns hosts file

/etc/hosts
127.0.0.1 new_hostname

Else, every time you issue a # sudo su, the server will delay the escalation prompt because it is trying to resolve the new name.

You can also change the ansible.cfg to;
# SSH timeout
timeout = 60
gather_timeout = 60

@mgedmin Greetings! Thanks for taking the time to open this issue. In order for the community to handle your issue effectively, we need a bit more information.

Here are the items we could not find in your description:

  • issue type
  • component name

Please set the description of this issue with this template:
https://raw.githubusercontent.com/ansible/ansible/devel/.github/ISSUE_TEMPLATE.md

click here for bot help

This issue was gone for a long time on Solaris 11 x86 but reappeared now with Ansible 2.3.0.0 😞 If I remember correctly it always failed if Pipelining was enabled, but without everything worked fine.

The connection via SSH works but the response for the priviledge escalation to root via su times out.

fatal: [host.example.com]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (62s) waiting for privilege escalation prompt: "
}

Heres the full debug output ob this section:

<host.example.com> ESTABLISH SSH CONNECTION FOR USER: admin
<host.example.com> SSH: ansible.cfg set ssh_args: (-C)(-o)(ControlMaster=auto)(-o)(ControlPersist=60s)
<host.example.com> SSH: ansible_password/ansible_ssh_pass not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no)
<host.example.com> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User=admin)
<host.example.com> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
<host.example.com> SSH: PlayContext set ssh_common_args: ()
<host.example.com> SSH: PlayContext set ssh_extra_args: ()
<host.example.com> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/Users/username/.ansible/cp/d6313ea46e)
<host.example.com> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=admin -o ConnectTimeout=10 -o ControlPath=/Users/username/.ansible/cp/d6313ea46e -tt host.example.com '/bin/sh -c '"'"'su  root -c '"'"'"'"'"'"'"'"'/bin/sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-oqvkeuwptoqgvjiknvhcsreimefzsmvo; /usr/bin/python /home/admin/.ansible/tmp/ansible-tmp-1493043257.92-168849063784581/setup.py; rm -rf "/home/admin/.ansible/tmp/ansible-tmp-1493043257.92-168849063784581/" > /dev/null 2>&1'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"' && sleep 0'"'"''
  6133 1493043258.53019: Initial state: awaiting_prompt: <function detect_su_prompt at 0x10cced050>
  6133 1493043258.53967: stderr chunk (state=0):
>>>OpenSSH_7.4p1, LibreSSL 2.5.0
debug1: Reading configuration data /Users/username/.ssh/config
debug1: /Users/username/.ssh/config line 21: Applying options for host.example.com
<<<

  6133 1493043258.54051: stderr chunk (state=0):
>>>debug3: kex names ok: [diffie-hellman-group1-sha1]
debug1: /Users/username/.ssh/config line 291: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: UpdateHostKeys=ask is incompatible with ControlPersist; disabling
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 6137
debug3: mux_client_request_session: session request sent
<<<

  6133 1493043258.54203: stderr chunk (state=0):
>>>debug1: mux_client_request_session: master session id: 2
<<<

  6133 1493043258.56315: stdout chunk (state=0):
>>>Password: <<<

  6133 1493043270.56450: done running TaskExecutor() for host.example.com/TASK: Gathering Facts
  6133 1493043270.56487: sending task result
  6133 1493043270.56571: done sending task result
  6133 1493043270.56617: WORKER PROCESS EXITING
  6061 1493043270.56744: marking host.example.com as failed
  6061 1493043270.56785: marking host host.example.com failed, current state: HOST STATE: block=0, task=0, rescue=0, always=0, run_state=ITERATING_SETUP, fail_state=FAILED_NONE, pending_setup=True, tasks child state? (None), rescue child state? (None), always child state? (None), did rescue? False, did start at task? False
  6061 1493043270.56804: ^ failed state is now: HOST STATE: block=0, task=0, rescue=0, always=0, run_state=ITERATING_COMPLETE, fail_state=FAILED_SETUP, pending_setup=True, tasks child state? (None), rescue child state? (None), always child state? (None), did rescue? False, did start at task? False
  6061 1493043270.56838: getting the next task for host host.example.com
  6061 1493043270.56848: host host.example.com is done iterating, returning
fatal: [host.example.com]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (12s) waiting for privilege escalation prompt: "
}
  6061 1493043270.56973: no more pending results, returning what we have
  6061 1493043270.56995: results queue empty
  6061 1493043270.57005: checking for any_errors_fatal
  6061 1493043270.57017: done checking for any_errors_fatal
  6061 1493043270.57027: checking for max_fail_percentage
  6061 1493043270.57042: done checking for max_fail_percentage
  6061 1493043270.57053: checking to see if all hosts have failed and the running result is not ok
  6061 1493043270.57064: done checking to see if all hosts have failed
  6061 1493043270.57074: getting the remaining hosts for this loop
  6061 1493043270.57115: done getting the remaining hosts for this loop
  6061 1493043270.57136: building list of next tasks for hosts
  6061 1493043270.57143: getting the next task for host host.example.com
  6061 1493043270.57150: host host.example.com is done iterating, returning
  6061 1493043270.57156: done building task lists
  6061 1493043270.57165: counting tasks in each state of execution
  6061 1493043270.57175: done counting tasks in each state of execution:
    num_setups: 0
    num_tasks: 0
    num_rescue: 0
    num_always: 0
  6061 1493043270.57185: all hosts are done, so returning None's for all hosts
  6061 1493043270.57191: done queuing things up, now waiting for results queue to drain
  6061 1493043270.57198: results queue empty
  6061 1493043270.57203: checking for any_errors_fatal
  6061 1493043270.57207: done checking for any_errors_fatal
  6061 1493043270.57212: checking for max_fail_percentage
  6061 1493043270.57217: done checking for max_fail_percentage
  6061 1493043270.57221: checking to see if all hosts have failed and the running result is not ok
  6061 1493043270.57225: done checking to see if all hosts have failed
  6061 1493043270.57232: getting the next task for host host.example.com
  6061 1493043270.57238: host host.example.com is done iterating, returning

Have the same troubles with MacOSX (Sierra) to Debian8.7 VMs in Parallels Desktop, with SU password prompt.

The problem seems to be related to the Python side not receiving the "Password:" (or "missing" it) as I can execute the -vvvv ssh command with copy-paste in the shell, and it connects, provides the Password: prompt, I type the password, and it executes the su etc. just fine.

Fired up a sshd -dd on the remotes, and I can confirm, it does read stuff, especailly debugging information in the msg field, just not "seeing" the Password: prompt ;(
Perhaps something else in the settings of the su command and the pipe/terminal redirections?

"msg": "Timeout (32s) waiting for privilege escalation prompt: Environment:\r\n USER=debian\r\n LOGNAME=debian\r\n HOME=/home/debian\r\n PATH=/usr/local/bin:/usr/bin:/bin:/usr/games\r\n MAIL=/var/mail/debian\r\n SHELL=/bin/bash\r\n SSH_CLIENT=10.10.10.2 61637 22\r\n SSH_CONNECTION=10.10.10.2 61637 10.10.10.114 22\r\n SSH_TTY=/dev/pts/1\r\n TERM=xterm-256color\r\n LANG=en_US.UTF-8\r\n LANGUAGE=en_ZA:en\r\n SSH_AUTH_SOCK=/tmp/ssh-80HbP4JU39/agent.2224\r\n"

Have the same troubles with MacOSX (Sierra) to Debian8.7 VMs in Parallels Desktop, with SU password prompt.

In v2.3, there is an issue with su which has been resolved in https://github.com/ansible/ansible/pull/23710

@sivel now how to get that version it's fixed in on my MacOSX with pip?? :{=}

@passw0rd123 Some thoughts while looking at that debug output...

EDIT: I was looking at newer code, I suspect this is the issue @sivel mentioned fixed in #23710

host.example.com> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=admin -o ConnectTimeout=10 -o ControlPath=/Users/username/.ansible/cp/d6313ea46e -tt host.example.com '/bin/sh -c '"'"'su  root -c '"'"'"'"'"'"'"'"'/bin/sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-oqvkeuwptoqgvjiknvhcsreimefzsmvo; /usr/bin/python /home/admin/.ansible/tmp/ansible-tmp-1493043257.92-168849063784581/setup.py; rm -rf "/home/admin/.ansible/tmp/ansible-tmp-1493043257.92-168849063784581/" > /dev/null 2>&1'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"' && sleep 0'"'"''
  6133 1493043258.53019: Initial state: awaiting_prompt: <function detect_su_prompt at 0x10cced050>
  6133 1493043258.53967: stderr chunk (state=0):
>>>OpenSSH_7.4p1, LibreSSL 2.5.0
debug1: Reading configuration data /Users/username/.ssh/config
debug1: /Users/username/.ssh/config line 21: Applying options for host.example.com
<<<

  6133 1493043258.54051: stderr chunk (state=0):
>>>debug3: kex names ok: [diffie-hellman-group1-sha1]
debug1: /Users/username/.ssh/config line 291: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: UpdateHostKeys=ask is incompatible with ControlPersist; disabling
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 6137
debug3: mux_client_request_session: session request sent
<<<

  6133 1493043258.54203: stderr chunk (state=0):
>>>debug1: mux_client_request_session: master session id: 2
<<<

  6133 1493043258.56315: stdout chunk (state=0):
>>>Password: <<<

  6133 1493043270.56450: done running TaskExecutor() for host.example.com/TASK: Gathering Facts
  < misc debug cut here >
  6133 1493043270.56617: WORKER PROCESS EXITING
  < misc debug cut here >
fatal: [host.example.com]: FAILED! => {
    "failed": true, 
    "msg": "Timeout (12s) waiting for privilege escalation prompt: "
}

That looks like what I would expect if the we never exit the initial event processing loop at
https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L629-L651

                for key, event in events:
                    if key.fileobj == p.stdout:
                        b_chunk = p.stdout.read()
                        if b_chunk == b'':
                            # stdout has been closed, stop watching it
                            selector.unregister(p.stdout)
                            # When ssh has ControlMaster (+ControlPath/Persist) enabled, the
                            # first connection goes into the background and we never see EOF
                            # on stderr. If we see EOF on stdout, lower the select timeout
                            # to reduce the time wasted selecting on stderr if we observe
                            # that the process has not yet existed after this EOF. Otherwise
                            # we may spend a long timeout period waiting for an EOF that is
                            # not going to arrive until the persisted connection closes.
                            timeout = 1
                        b_tmp_stdout += b_chunk
                        display.debug("stdout chunk (state=%s):\n>>>%s<<<\n" % (state, to_text(b_chunk)))
                    elif key.fileobj == p.stderr:
                        b_chunk = p.stderr.read()
                        if b_chunk == b'':
                            # stderr has been closed, stop watching it
                            selector.unregister(p.stderr)
                        b_tmp_stderr += b_chunk
                        display.debug("stderr chunk (state=%s):\n>>>%s<<<\n" % (state, to_text(b_chunk)))

If we got the stderr and stdout chunks shown in the log, then either:

1) blocked on one of the read() calls. Shouldn't happen in that scenario, but could be a selectors bug?
2) The for loop kept getting non-stderr/non-stdout events?

Maybe something solaris specific?

Though, either of those should eventually unblock and the examine_prompt() get called and set the become_success flag and sending of the password. But that doesnt seem to happen...

For people that wind up here due to Google searches, I raised #25437 for a case with the same error message but caused due to target disks being full.

I previously had this intermittently (with client OS X and target CentOS 7.3).

I now have a 100% reproducible case using Ansible 2.2.3.0 with a trivial playbook. (UPDATE: This is fixed in 2.3.1.0 due to https://github.com/ansible/ansible/pull/23710)

This only occurs when I use -vvvv or -vvvvv - if I use -vvv, -vv or less, it doesn't happen (tested many times, only varying this factor). Client is OS X 10.11, target is Ubuntu 14.04. Playbook is just:

- hosts: all
  become: yes

I can provide full details if this is of interest - the extra verbosity triggering this was quite surprising.

Need a help to solve the pbrun syntax issue . ---

name: Touch a file
hosts: test
become: true
become_method: pbrun
become_user: ops
become_flags: 'ops'
tasks:
name: run touch cmmand
command: "touch /tmp/1.txt"
SSH: EXEC sshpass -d12 ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o User=portal -o ConnectTimeout=10 -o ControlPath=/home/ops/.ansible/cp/ansible-ssh-%h-%p-%r -tt xxx.xx.xxx.xxx '/bin/sh -c '"'"'pbrun ops -u ops '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-oiqkzexpuphtwjesfjmrzmqfcrjdccre; /usr/bin/python /tmp/ansible-tmp-1498555805.91-140035303656300/setup.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
fatal: [168.72.164.236]: FAILED! => {
"failed": true,
"msg": "Timeout (12s) waiting for privilege escalation prompt: \r\n\r\n\r\n"
}

@ctidocker your issue looks quite different - suggest you open a new issue.

We believe the original issue reported was fixed in 2.3.1. We are closing this issue and if you still have SSH issues please open a specific github issue with as much detail as possible along with the reproducer, thanks!

Here how i solved the issue .It came down to the parameters used in the yaml file
The parameter become_method was the culprit
Previous setting was
become: yes
become_method: su

Changed it to
become: yes
become_method: sudo
it started working with no issues . Play around with this parameter must be something to do with the shell as i am on suse 12
ansible --version
ansible 2.7.0.dev0 (devel 27b85e732d) last updated 2018/06/20 16:15:16 (GMT +000)
python version = 2.7.13 (default, Jan 11 2017, 10:56:06) [GCC]

also for debugging issues this command was very helpful
ANSIBLE_DEBUG=1 ansible-playbook test2.yml -vvv
link below also helped https://docs.ansible.com/ansible/latest/user_guide/become.html?highlight=assume%20root
Hope it fixes the issue
P.S :Increasing the timeout setting and the pipeline parameter didn't seem to work for me

Regards
Reuben

@amenonsen Yes, this fixes the issue.

I tried to repair it by this way, but failed. Can you help me?

Was this page helpful?
0 / 5 - 0 ratings