Ansible: advise updating controlpath settings when ssh throws 'unix domain socket "too long"' error

Created on 9 Jul 2015  ·  66Comments  ·  Source: ansible/ansible

ISSUE TYPE

Feature Idea

COMPONENT NAME

ssh control persist

ANSIBLE VERSION

2.0

SUMMARY

When trying to use the ec2 plugin, ssh fails with this error:

SSH Error: unix_listener: "/Users/luke/.ansible/cp/ansible-ssh-ec2-255-255-255-255.compute-1.amazonaws.com-22-ubuntu.CErvOvRE5U0urCgm" too long for Unix domain socket

Here's the full example:

$ ansible -vvvv -i ec2.py -u ubuntu us-east-1 -m ping
<ec2-255-255-255-255.compute-1.amazonaws.com> ESTABLISH CONNECTION FOR USER: ubuntu
<ec2-255-255-255-255.compute-1.amazonaws.com> REMOTE_MODULE ping
<ec2-255-255-255-255.compute-1.amazonaws.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/luke/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ec2-255-255-255-255.compute-1.amazonaws.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180 && echo $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180'
ec2-255-255-255-255.compute-1.amazonaws.com | FAILED => SSH Error: unix_listener: "/Users/luke/.ansible/cp/ansible-ssh-ec2-255-255-255-255.compute-1.amazonaws.com-22-ubuntu.CErvOvRE5U0urCgm" too long for Unix domain socket
    while connecting to 255.255.255.255:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

I've changed some of the sensitive info in here like the IP etc.

affects_2.0 affects_2.3 feature

Most helpful comment

Added this to my ansible config to shorten the path:

[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r

Might be useful to include that in the error output or do something else more graceful instead of failing.

All 66 comments

Added this to my ansible config to shorten the path:

[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r

Might be useful to include that in the error output or do something else more graceful instead of failing.

for me same error! I agree with LukeHoersten in this fix.

Thanks for pointing your solution out @LukeHoersten

No problem. Hopefully we can get a more solid fix in there. It's bad for newcomers especially.

The ansible config has another commented out suggestion
control_path = %(directory)s/%%h-%%r

But yes a help message would be useful.

I just hit this as well. I'm new and wasted huge amounts of time. Thanks for the answer! And I agree, needs to be fixed.

I also :+1: for this feature.

Faced that today. Thanks for the hints on ansible.cfg !!

Editing control_path does not work on Mac OSX El Capitan.

This works for me in El Capitan:

[ssh_connection]
control_path = %(directory)s/%%h-%%r

As @willotter pointed out, it's one of the commented out statements in https://raw.githubusercontent.com/ansible/ansible/devel/examples/ansible.cfg

Interested to know why it's an issue - since when are long pathnames a problem outside Windows?

this works for me after upgrading to EI Capitan.

[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r

@deyvsh why it's an issue - since when are long pathnames a problem outside Windows?

Since El Capitan was released by Apple. Aside from a page in Chinese, this is the only page that seems to reference this new behavior in MacOS. I ran into the same issue when trying to use Tramp mode in emacs which allows transparent access to remote files via ssh. Same error about long file names for a unix domain socket, but not as easy to workaround as in Ansible.

@cswarth The ansible config is just passed to your ssh client. You might be able to set up a control_path in your ssh config file ~/.ssh/config like this:

Host *
  ControlPath /tmp/%r@%h:%p

I don't have Mac OS X so I can't test this but this should work unless emacs passes any specific parameters through to SSH.

@willotter I had to adapt this idea and add it to my ansible.cfg file to get it to work.

[ssh_connection]
control_path = /tmp/%%h-%%p-%%r

2017 update: looks like @willotter no longer exists :(

@LukeHoersten Thanks for this, fixed the issue for me!

The root cause for this is at

https://github.com/openssh/openssh-portable/blob/9ada37d36003a77902e90a3214981e417457cf13/misc.c#L1070

int
unix_listener(const char *path, int backlog, int unlink_first)
{
    struct sockaddr_un sunaddr;
    int saved_errno, sock;

    memset(&sunaddr, 0, sizeof(sunaddr));
    sunaddr.sun_family = AF_UNIX;
    if (strlcpy(sunaddr.sun_path, path, sizeof(sunaddr.sun_path)) >= sizeof(sunaddr.sun_path)) {
        error("%s: \"%s\" too long for Unix domain socket", __func__,
            path);
        errno = ENAMETOOLONG;
        return -1;
    }

To know the limit (sizeof(sunaddr.sun_path)), we need to look at https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man4/unix.4.html

           struct sockaddr_un {
                   u_char  sun_len;
                   u_char  sun_family;
                   char    sun_path[104];
           };

The path is limited to 104 characters including the 0 terminator.

This is also being discussed in https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing#Manually_Establishing_Multiplexed_Connections which also suggests you are using

Starting with 6.7, the combination of %r@%h:%p and variations on it can be replaced with %C which by itself generates a hash from the concatenation of %l%h%p%r.

In the end, you want to use

[ssh_connection]
control_path = %(directory)s/%%C

Also, you want to stay the fuck out of /tmp or any other world-writeable, world-readable location, because security.

See also http://pastebin.com/ugXKMFsv

@isotopp good suggestions. I wonder why we don't just change the default to control_path = %(directory)s/%%C to avoid future issues.

@LukeHoersten I think ansible should change the default, too. In fact, I did

[:~] $ grep -i control ~/.ssh/config
ControlMaster auto
ControlPath ~/.ssh/_%C

Ping @bcoca - see analysis and proposed changes above.

+1

because it would not work on many many OSs/distros that run even slightly older versions of openssh

Proposed change in http://pastebin.com/ugXKMFsv changes docs and comments only. Will work with old versions of openssh, but make pointer to %C more obvious.

I have a long username on my machine (11 characters), this caused my directory to go over the character limit.

https://github.com/ansible/ansible/blob/devel/examples/ansible.cfg#L216-L225

I dropped the -%%r and it solved this problem for me.

I hit this error today because instead of my inventory file, I supplied my group_vars file and ansible happily parsed the encrypted file somehow and accepted something like 182937891273891723981723891723987189237189237981273981 as the hostname. SSH also didn't think that was weird before it noticed the long ControlPath. A warning for posterity - run everything with -vvvv and make sure you're pointing to the right host and all that.

Thank you for this. It fixed the error I had on OS X El Capitan.

+1
This just solved my problem on OS X El Capitan.

Worked for me as well on OS X EL Capitan. Just a note, if you have installed ansible through brew, then the file is /usr/local/etc/ansible/ansible.cfg

:+1 This happened to me just trying to do a ansible all -i inventory -m ping having a host with a long hostname like ec2-XX-XXX-XX-XX.eu-west-1.compute.amazonaws.com

This worked for me on El Capitan:

I created an ansible.cfg file in my current directory with:

[ssh_connection]
control_path = %(directory)s/%%C

Now running ansible .. didn't give me any ssh errors.

Worked for me as well on OS X EL Capitan. Just a note, if you have installed ansible through brew, then the file is /usr/local/etc/ansible/ansible.cfg

I'm El Capitan and installed ansible via brew, and it ignored the /usr/local/etc/ansible/ansible.cfg file I tried adding with those settings.

@tleyden That's quite odd, /usr/local/etc/ansible/ansible.cfg works fine for me.

Oh, I just realized the difference -- I installed ansible via pip install ansible, not via brew

Why is it freakin adding string like CErvOvRE5U0urCgm in the end? Things break for me because of that useless string.

Just adding some comments here to be clear about what actions can be taken:

  • Documentation. Looks like the suggested doc updates are in a gist linked from this ticket but not in a PR so it was never merged.
  • Better catching of errors -- If %C is used and the ssh doesn't support it then tell people to replace with %l-%h-%p. If path is too long, tell people to try %C or simply shorten the path.
  • try to detect whether the ssh that we're using supports %C and if so, make use of it, otherwise do not (maybe this is only relevant as a default, not when the user configures something in their config file?) (Have to be careful not to make connections take a lot longer, though).

I also added:
%(directory)s/%%h‐%%r
But my path is still too long? How can I fix this:

SSH Error: unix_listener: "/Users/myfullname/.ansible/cp/ec2-xx-xx-xx-xx.eu-central-1.compute.amazonaws.com-centos.AAZFTHkT5xXXXXXX" too long for Unix domain socket
    while connecting to 52.xx.xx.xx:22

I'm seeing this issue with ansible 2.1.0.0 on Ubuntu 16.04

$ ssh -V
OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips  1 Mar 2016

Adding this to my ansible.cfg worked:

[ssh_connection]
control_path=%(directory)s/%%h-%%p-%%r

Alternatively, changing the long AWS domain name to an IP address also fixed it, even without the change to ssh_connection.control_path in ansible.cfg.

As other's have said, this error was not apparent when running with -vvvv. I had to copy the command in the debug output and run that directly in a terminal to see the error "too long for Unix domain socket".

Im also having the same issue.

this issue has been super annoying, having to switch back and forth between IP's and FQDN's depending on the machine running the Ansible playbook... any real solution planned from Ansible's side?

@swoodford , perhaps you can file an issue with your linux distro to change the default settings. For example fedora maintainer canged default to use shorter control socket. Issue seems to be ansible want to retain compatibility by default with old distros. I'm not sure it makes sense because users of newer distros should be now much more. That means at least newer distros should not mind changing the default during packaging because distro knows it is recent enough to work with the more reliable option.

I still think ansible should change dafault though.

Very funny. We have had the same issue in cdist some time ago (and are looking at another bug related to this). The sun_path limit in Unix is a really, really old limit that bites us all in 2016.

Easiest solution: none.
2nd best solution: try to keep the socket name short. Still breaks if home dir is a long path
3rd best solution: store it somewhere in /tmp/short-random-path/c (just need one character)

Long term solution: Get rid of sun_path limit or raise to a sensible 2016 default (anyone from austin group / posix reading here?)

What does the %(directory) stand for?

@isotopp

Is this the right syntax (with underline prefix) for putting inside ~/.ssh/config file?

ControlMaster auto
ControlPath ~/.ssh/_%C

Is this an escape that would have the same meaning as the double %% from ansible.cfg file? I am trying to configure both of them the same way as I use ssh even outside ansible.

Even after add the control_path to my ansible.cfg in my project I was still getting this error but I reverted back to version 2.1.3, ran the same command which threw the error when running 2.2.1, and the issue has been resolved.

Still having this issue with version: ansible 2.2.0.0

really strange issue. ansible 2.2.0.0 on fedora 24 -> problem existed
git head from 2016/07/05 on OSX -> problem does not exist.

@bcoca i am always a fan of backward compatibility (yeah i sent that centos 6.5 fix). what about making it dynamic on openssh/distro version which controlpath to use?

it is already dynamic, see the logic behind 'smart' connection being the default

when connecting from host to host, maybe you do not have ssh keys in your backpack? :)

On a side note %C is not a great default right now as EL7 has openssh 6.6, and %C wasn't added until openssh 6.7 and hasn't been backported.

You can use the fully expanded form of %l%h%p%r on EL7 though, but only partially mitigates as it won't do a hash still of course.

It should be distro owners to change default config to suite the shipped package. I think upstream should not be waiting 7 years before moving forward with important improvements like this.

As I'm still using Ansible version 2.2 & Ansible Tower 3.1.1, I also ran into this issue. As @dennisobrien pointed our earlier, changing the inventory from an AWS domain name to an AWS IP address resolved this issue. However, I tried using just these variables in configuration first, and it did not resolve the problem:

---
ssh_connection:
  control_path: "%(directory)s/%%h-%%p-%%r"

@b-long , use control_path %(directory)s/%%C

My server has this problem and I don't have permissions to change it. How can I solve it in the client's end?

@thefourtheye it's purely a client problem, not a server problem. You can find the option to set in your ansible.cfg file earlier in this thread.

@antoineco Oh, thank you. I am totally new to ansible and I don't even have it installed in my machine. Still having the file ansible.cfg in the home directory would work?

I have the same problem, i try all solution include add config file .ansible.cfg in ~/:
[defaults] inventory=/etc/ansible/hosts [ssh_connection] control_path=%(directory)s/%%h-%%r control_path_dir=~/.ansible/cp

And add know host and ip to ssh known_hosts. But it is still not work, it is ubuntu on EC2.
This is the error:

fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added 'ec2-xx-192-174-42.ap-northeast-1.compute.amazonaws.com,xx.192.174.42' (ECDSA) to the list of known hosts.\r\nunix_listener: \"/Users/name/.ansible/cp/ec2-xx-192-174-42.ap-northeast-1.compute.amazonaws.com-ubuntu.1fndG2vtHPliheeZ\" too long for Unix domain socket\r\n", "unreachable": true

You're not using the proposed solution which is control_path = %(directory)s/%%C.

@akostadinov Thanks you, it work. Too much solution here.

Too much solution here.

If only it was harder... curse those solution providers!

I tried adding all the lines suggested here in the ~/ansible.cfg file in my location machine, but it hasn't helped. I am giving up.

What works for me now, is getting the IP address of the machine with nslookup and logging in with that.

@thefourtheye , I'm not sure how many "lines suggested" you see here. Use the post with 50+ likes. But besides proper option you need to use a configuration file that ansible knows about. In your case ~/.ansible.cfg. Try to pay attention to details, dot in front of user config file is a common unix convention.

@akostadinov I am sorry, that was a typo. This is how it looks like

➜  ~ cat ~/.ansible.cfg
[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r

I just want to chime in with my .ansible.cfg:

[ssh_connection]
control_path = /tmp/control_%%l_%%h_%%p_%%r

for me, directory was something ridiculously long, the latter part was just the straw that broke the camel's back. Also I have this in my .ssh/config so I can reuse the same connection:

ControlMaster                    auto
ControlPath                      /tmp/control_%l_%h_%p_%r

Sorry but hardcoded tmp is not only not portable but also a serious security risk. For good reasons MacOS does not allow users to write to /tmp and provides isolated (private) tmp folders for each user.

Tmp would work only if you use OS provided tmp path, something like %(tmp)s ... after patching ansible.

Guys, please read existing comments, it's ridiculous everybody to come ask the same thing and somebody to add same solution. Use proper config file and see https://github.com/ansible/ansible/issues/11536#issuecomment-153030743.

Somebody, please close the thread to avoid further spam.

@ssbarnea harcoded anything is not portable... that's why it's not the default in ansible... not sure i agree about the security issue or macOS issue since /tmp is sticky and openssh uses a sensible mode (0600) for these files.

regarding the solution using %C that requires a recent openssh...

I don't really care about ancient ssh versions, especially on the ansible controller. In order to evolve we need to let few things behind and in this case is not really a big deal because those affected could change config in order to be able to continue to use it.

I think is essential for the Ansible user experience (UX), to provide defaults that will suit most users, minimizing the need for change. I doubt that we have more than 1-2% of users using versions of open openssh that does not support %C.

I think that we need to implement in Ansible few critical INI variables ASAP because every other week we encounter bugs that are caused by lack of them: %(tmpdir)sm $(configdir)s, %(inventorydir)s.

If we have these people would be able to create reliable relative paths.

Sadly, in my case the problem is even worse because we are using Ansible as part of CI and because like many we have multiple Jenkins nodes on the same machine, running under same user we did encounter ssh session highjacking quite often. Anyway my problem is more complex and outside the scope of this ticket.

I fixed this problem in a generic way for all versions of ssh 6 months ago. If anyone is seeing the problem with Ansible 2.3+, it is because you have set a custom control path in ansible.cfg instead of leaving it blank.

https://github.com/ansible/ansible/commit/ac78347f2bc4a489c7e254c6c1d950fb45f240ad

https://github.com/ansible/ansible/blob/devel/examples/ansible.cfg#L360-L367

# The path to use for the ControlPath sockets. This defaults to a hashed string of the hostname, 
# port and username (empty string in the config). The hash mitigates a common problem users 
# found with long hostames and the conventional %(directory)s/ansible-ssh-%%h-%%p-%%r format. 
# In those cases, a "too long for Unix domain socket" ssh error would occur.
#
# Example:
# control_path = %(directory)s/%%h-%%r
#control_path =

Since this conversation keeps continuing without referencing the patch above, I am going to lock it. If you have further questions about the topic, please use the mailing list.

Was this page helpful?
0 / 5 - 0 ratings