Moby: kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3"

Created on 6 May 2014  ·  518Comments  ·  Source: moby/moby

This happens when I login the container, and can't quit by Ctrl-c.

My system is Ubuntu 12.04, kernel is 3.8.0-25-generic.

docker version:

root@wutq-docker:~# docker version
Client version: 0.10.0
Client API version: 1.10
Go version (client): go1.2.1
Git commit (client): dc9c28f
Server version: 0.10.0
Server API version: 1.10
Git commit (server): dc9c28f
Go version (server): go1.2.1
Last stable version: 0.10.0

I have used the script https://raw.githubusercontent.com/dotcloud/docker/master/contrib/check-config.sh to check, and all right.

I watch the syslog and found this message:

May  6 11:30:33 wutq-docker kernel: [62365.889369] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:44 wutq-docker kernel: [62376.108277] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:30:54 wutq-docker kernel: [62386.327156] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:02 wutq-docker kernel: [62394.423920] INFO: task docker:1024 blocked for more than 120 seconds.
May  6 11:31:02 wutq-docker kernel: [62394.424175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  6 11:31:02 wutq-docker kernel: [62394.424505] docker          D 0000000000000001     0  1024      1 0x00000004
May  6 11:31:02 wutq-docker kernel: [62394.424511]  ffff880077793cb0 0000000000000082 ffffffffffffff04 ffffffff816df509
May  6 11:31:02 wutq-docker kernel: [62394.424517]  ffff880077793fd8 ffff880077793fd8 ffff880077793fd8 0000000000013f40
May  6 11:31:02 wutq-docker kernel: [62394.424521]  ffff88007c461740 ffff880076b1dd00 000080d081f06880 ffffffff81cbbda0
May  6 11:31:02 wutq-docker kernel: [62394.424526] Call Trace:                                                         
May  6 11:31:02 wutq-docker kernel: [62394.424668]  [<ffffffff816df509>] ? __slab_alloc+0x28a/0x2b2
May  6 11:31:02 wutq-docker kernel: [62394.424700]  [<ffffffff816f1849>] schedule+0x29/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424705]  [<ffffffff816f1afe>] schedule_preempt_disabled+0xe/0x10
May  6 11:31:02 wutq-docker kernel: [62394.424710]  [<ffffffff816f0777>] __mutex_lock_slowpath+0xd7/0x150
May  6 11:31:02 wutq-docker kernel: [62394.424715]  [<ffffffff815dc809>] ? copy_net_ns+0x69/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424719]  [<ffffffff815dc0b1>] ? net_alloc_generic+0x21/0x30
May  6 11:31:02 wutq-docker kernel: [62394.424724]  [<ffffffff816f038a>] mutex_lock+0x2a/0x50
May  6 11:31:02 wutq-docker kernel: [62394.424727]  [<ffffffff815dc82c>] copy_net_ns+0x8c/0x130
May  6 11:31:02 wutq-docker kernel: [62394.424733]  [<ffffffff81084851>] create_new_namespaces+0x101/0x1b0
May  6 11:31:02 wutq-docker kernel: [62394.424737]  [<ffffffff81084a33>] copy_namespaces+0xa3/0xe0
May  6 11:31:02 wutq-docker kernel: [62394.424742]  [<ffffffff81057a60>] ? dup_mm+0x140/0x240
May  6 11:31:02 wutq-docker kernel: [62394.424746]  [<ffffffff81058294>] copy_process.part.22+0x6f4/0xe60
May  6 11:31:02 wutq-docker kernel: [62394.424752]  [<ffffffff812da406>] ? security_file_alloc+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424758]  [<ffffffff8119d118>] ? get_empty_filp+0x88/0x180
May  6 11:31:02 wutq-docker kernel: [62394.424762]  [<ffffffff81058a80>] copy_process+0x80/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424766]  [<ffffffff81058b7c>] do_fork+0x9c/0x230
May  6 11:31:02 wutq-docker kernel: [62394.424769]  [<ffffffff816f277e>] ? _raw_spin_lock+0xe/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424774]  [<ffffffff811b9185>] ? __fd_install+0x55/0x70
May  6 11:31:02 wutq-docker kernel: [62394.424777]  [<ffffffff81058d96>] sys_clone+0x16/0x20
May  6 11:31:02 wutq-docker kernel: [62394.424782]  [<ffffffff816fb939>] stub_clone+0x69/0x90
May  6 11:31:02 wutq-docker kernel: [62394.424786]  [<ffffffff816fb5dd>] ? system_call_fastpath+0x1a/0x1f
May  6 11:31:04 wutq-docker kernel: [62396.466223] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:14 wutq-docker kernel: [62406.689132] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:25 wutq-docker kernel: [62416.908036] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:35 wutq-docker kernel: [62427.126927] unregister_netdevice: waiting for lo to become free. Usage count = 3
May  6 11:31:45 wutq-docker kernel: [62437.345860] unregister_netdevice: waiting for lo to become free. Usage count = 3

After happend this, I open another terminal and kill this process, and then restart docker, but this will be hanged.

I reboot the host, and it still display that messages for some minutes when shutdown:
screen shot 2014-05-06 at 11 49 27

arekernel arenetworking

Most helpful comment

(repeating this https://github.com/moby/moby/issues/5618#issuecomment-351942943 here again, because GitHub is hiding old comments)

If you are arriving here

The issue being discussed here is a kernel bug and has not yet been fully fixed. Some patches went in the kernel that fix _some_ occurrences of this issue, but others are not yet resolved.

There are a number of options that may help for _some_ situations, but not for all (again; it's most likely a combination of issues that trigger the same error)

The "unregister_netdevice: waiting for lo to become free" error itself is not the bug

If's the kernel crash _after_ that's a bug (see below)

Do not leave "I have this too" comments

"I have this too" does not help resolving the bug. only leave a comment if you have information that may help resolve the issue (in which case; providing a patch to the kernel upstream may be the best step).

If you want to let know you have this issue too use the "thumbs up" button in the top description:
screen shot 2017-03-09 at 16 12 17

If you want to stay informed on updates use the _subscribe button_.

screen shot 2017-03-09 at 16 11 03

Every comment here sends an e-mail / notification to over 3000 people I don't want to lock the conversation on this issue, because it's not resolved yet, but may be forced to if you ignore this.

I will be removing comments that don't add useful information in order to (slightly) shorten the thread

If you want to help resolving this issue

  • Read the whole thread, including those comments that are hidden; it's long, and github hides comments (so you'll have to click to make those visible again). There's a lot if information present in this thread already that could possibly help you

screen shot 2018-07-25 at 15 18 14

To be clear, the message itself is benign, it's the kernel crash after the messages reported by the OP which is not.

The comment in the code, where this message is coming from, explains what's happening. Basically every user, such as the IP stack) of a network device (such as the end of veth pair inside a container) increments a reference count in the network device structure when it is using the network device. When the device is removed (e,g. when the container is removed) each user is notified so that they can do some cleanup (e.g. closing open sockets etc) before decrementing the reference count. Because this cleanup can take some time, especially under heavy load (lot's of interface, a lot of connections etc), the kernel may print the message here once in a while.

If a user of network device never decrements the reference count, some other part of the kernel will determine that the task waiting for the cleanup is stuck and it will crash. It is only this crash which indicates a kernel bug (some user, via some code path, did not decrement the reference count). There have been several such bugs and they have been fixed in modern kernel (and possibly back ported to older ones). I have written quite a few stress tests (and continue writing them) to trigger such crashes but have not been able to reproduce on modern kernels (i do however the above message).

* Please only report on this issue if your kernel actually crashes*, and then we would be very interested in:

  • kernel version (output of uname -r)
  • Linux distribution/version
  • Are you on the latest kernel version of your Linux vendor?
  • Network setup (bridge, overlay, IPv4, IPv6, etc)
  • Description of the workload (what type of containers, what type of network load, etc)
  • And ideally a simple reproduction

Thanks!

All 518 comments

I'm seeing a very similar issue for eth0. Ubuntu 12.04 also.

I have to power cycle the machine. From /var/log/kern.log:

May 22 19:26:08 box kernel: [596765.670275] device veth5070 entered promiscuous mode
May 22 19:26:08 box kernel: [596765.680630] IPv6: ADDRCONF(NETDEV_UP): veth5070: link is not ready
May 22 19:26:08 box kernel: [596765.700561] IPv6: ADDRCONF(NETDEV_CHANGE): veth5070: link becomes ready
May 22 19:26:08 box kernel: [596765.700628] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:08 box kernel: [596765.700638] docker0: port 7(veth5070) entered forwarding state
May 22 19:26:19 box kernel: [596777.386084] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=170 DF PROTO=TCP SPT=51615 DPT=13162 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:21 box kernel: [596779.371993] [FW DBLOCK] IN=docker0 OUT= PHYSIN=veth5070 MAC=56:84:7a:fe:97:99:9e:df:a7:3f:23:42:08:00 SRC=172.17.0.8 DST=172.17.42.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=549 DF PROTO=TCP SPT=46878 DPT=12518 WINDOW=14600 RES=0x00 SYN URGP=0
May 22 19:26:23 box kernel: [596780.704031] docker0: port 7(veth5070) entered forwarding state
May 22 19:27:13 box kernel: [596831.359999] docker0: port 7(veth5070) entered disabled state
May 22 19:27:13 box kernel: [596831.361329] device veth5070 left promiscuous mode
May 22 19:27:13 box kernel: [596831.361333] docker0: port 7(veth5070) entered disabled state
May 22 19:27:24 box kernel: [596841.516039] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:34 box kernel: [596851.756060] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
May 22 19:27:44 box kernel: [596861.772101] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Hey, this just started happening for me as well.

Docker version:

Client version: 0.11.1
Client API version: 1.11
Go version (client): go1.2.1
Git commit (client): fb99f99
Server version: 0.11.1
Server API version: 1.11
Git commit (server): fb99f99
Go version (server): go1.2.1
Last stable version: 0.11.1

Kernel log: http://pastebin.com/TubCy1tG

System details:
Running Ubuntu 14.04 LTS with patched kernel (3.14.3-rt4). Yet to see it happen with the default linux-3.13.0-27-generic kernel. What's funny, though, is that when this happens, all my terminal windows freeze, letting me type a few characters at most before that. The same fate befalls any new ones I open, too - and I end up needing to power cycle my poor laptop just like the good doctor above. For the record, I'm running fish shell in urxvt or xterm in xmonad. Haven't checked if it affects plain bash.

This might be relevant:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065434#yui_3_10_3_1_1401948176063_2050

Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device.

Sure enough, one of the times this happened for me was right after apt-getting a package with a ton of dependencies.

Upgrading from Ubuntu 12.04.3 to 14.04 fixed this for me without any other changes.

I experience this on RHEL7, 3.10.0-123.4.2.el7.x86_64

I've noticed the same thing happening with my VirtualBox virtual network interfaces when I'm running 3.14-rt4. It's supposed to be fixed in vanilla 3.13 or something.

@egasimus Same here - I pulled in hundreds of MB of data before killing the container, then got this error.

I upgraded to Debian kernel 3.14 and the problem appears to have gone away. Looks like the problem existed in some kernels < 3.5, was fixed in 3.5, regressed in 3.6, and was patched in something 3.12-3.14. https://bugzilla.redhat.com/show_bug.cgi?id=880394

@spiffytech Do you have any idea where I can report this regarding the realtime kernel flavour? I think they're only releasing a RT patch for every other version, and would really hate to see 3.16-rt come out with this still broken. :/

EDIT: Filed it at kernel.org.

I'm getting this on Ubuntu 14.10 running a 3.18.1. Kernel log shows

Dec 21 22:49:31 inotmac kernel: [15225.866600] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:40 inotmac kernel: [15235.179263] INFO: task docker:19599 blocked for more than 120 seconds.
Dec 21 22:49:40 inotmac kernel: [15235.179268]       Tainted: G           OE  3.18.1-031801-generic #201412170637
Dec 21 22:49:40 inotmac kernel: [15235.179269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 22:49:40 inotmac kernel: [15235.179271] docker          D 0000000000000001     0 19599      1 0x00000000
Dec 21 22:49:40 inotmac kernel: [15235.179275]  ffff8802082abcc0 0000000000000086 ffff880235c3b700 00000000ffffffff
Dec 21 22:49:40 inotmac kernel: [15235.179277]  ffff8802082abfd8 0000000000013640 ffff8800288f2300 0000000000013640
Dec 21 22:49:40 inotmac kernel: [15235.179280]  ffff880232cf0000 ffff8801a467c600 ffffffff81f9d4b8 ffffffff81cd9c60
Dec 21 22:49:40 inotmac kernel: [15235.179282] Call Trace:
Dec 21 22:49:40 inotmac kernel: [15235.179289]  [<ffffffff817af549>] schedule+0x29/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179292]  [<ffffffff817af88e>] schedule_preempt_disabled+0xe/0x10
Dec 21 22:49:40 inotmac kernel: [15235.179296]  [<ffffffff817b1545>] __mutex_lock_slowpath+0x95/0x100
Dec 21 22:49:40 inotmac kernel: [15235.179299]  [<ffffffff8168d5c9>] ? copy_net_ns+0x69/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179302]  [<ffffffff817b15d3>] mutex_lock+0x23/0x37
Dec 21 22:49:40 inotmac kernel: [15235.179305]  [<ffffffff8168d5f8>] copy_net_ns+0x98/0x150
Dec 21 22:49:40 inotmac kernel: [15235.179308]  [<ffffffff810941f1>] create_new_namespaces+0x101/0x1b0
Dec 21 22:49:40 inotmac kernel: [15235.179311]  [<ffffffff8109432b>] copy_namespaces+0x8b/0xa0
Dec 21 22:49:40 inotmac kernel: [15235.179315]  [<ffffffff81073458>] copy_process.part.28+0x828/0xed0
Dec 21 22:49:40 inotmac kernel: [15235.179318]  [<ffffffff811f157f>] ? get_empty_filp+0xcf/0x1c0
Dec 21 22:49:40 inotmac kernel: [15235.179320]  [<ffffffff81073b80>] copy_process+0x80/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179323]  [<ffffffff81073ca2>] do_fork+0x62/0x280
Dec 21 22:49:40 inotmac kernel: [15235.179326]  [<ffffffff8120cfc0>] ? get_unused_fd_flags+0x30/0x40
Dec 21 22:49:40 inotmac kernel: [15235.179329]  [<ffffffff8120d028>] ? __fd_install+0x58/0x70
Dec 21 22:49:40 inotmac kernel: [15235.179331]  [<ffffffff81073f46>] SyS_clone+0x16/0x20
Dec 21 22:49:40 inotmac kernel: [15235.179334]  [<ffffffff817b3ab9>] stub_clone+0x69/0x90
Dec 21 22:49:40 inotmac kernel: [15235.179336]  [<ffffffff817b376d>] ? system_call_fastpath+0x16/0x1b
Dec 21 22:49:41 inotmac kernel: [15235.950976] unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 21 22:49:51 inotmac kernel: [15246.059346] unregister_netdevice: waiting for lo to become free. Usage count = 2

I'll send docker version/info once the system isn't frozen anymore :)

We're seeing this issue as well. Ubuntu 14.04, 3.13.0-37-generic

On Ubuntu 14.04 server, my team has found that downgrading from 3.13.0-40-generic to 3.13.0-32-generic "resolves" the issue. Given @sbward's observation, that would put the regression after 3.13.0-32-generic and before (or including) 3.13.0-37-generic.

I'll add that, in our case, we sometimes see a _negative_ usage count.

FWIW we hit this bug running lxc on trusty kernel (3.13.0-40-generic #69-Ubuntu) the message appears in dmesg followed by this stacktrace:

[27211131.602869] INFO: task lxc-start:26342 blocked for more than 120 seconds.
[27211131.602874]       Not tainted 3.13.0-40-generic #69-Ubuntu
[27211131.602877] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27211131.602881] lxc-start       D 0000000000000001     0 26342      1 0x00000080
[27211131.602883]  ffff88000d001d40 0000000000000282 ffff88001aa21800 ffff88000d001fd8
[27211131.602886]  0000000000014480 0000000000014480 ffff88001aa21800 ffffffff81cdb760
[27211131.602888]  ffffffff81cdb764 ffff88001aa21800 00000000ffffffff ffffffff81cdb768
[27211131.602891] Call Trace:
[27211131.602894]  [<ffffffff81723b69>] schedule_preempt_disabled+0x29/0x70
[27211131.602897]  [<ffffffff817259d5>] __mutex_lock_slowpath+0x135/0x1b0
[27211131.602900]  [<ffffffff811a2679>] ? __kmalloc+0x1e9/0x230
[27211131.602903]  [<ffffffff81725a6f>] mutex_lock+0x1f/0x2f
[27211131.602905]  [<ffffffff8161c2c1>] copy_net_ns+0x71/0x130
[27211131.602908]  [<ffffffff8108f889>] create_new_namespaces+0xf9/0x180
[27211131.602910]  [<ffffffff8108f983>] copy_namespaces+0x73/0xa0
[27211131.602912]  [<ffffffff81065b16>] copy_process.part.26+0x9a6/0x16b0
[27211131.602915]  [<ffffffff810669f5>] do_fork+0xd5/0x340
[27211131.602917]  [<ffffffff810c8e8d>] ? call_rcu_sched+0x1d/0x20
[27211131.602919]  [<ffffffff81066ce6>] SyS_clone+0x16/0x20
[27211131.602921]  [<ffffffff81730089>] stub_clone+0x69/0x90
[27211131.602923]  [<ffffffff8172fd2d>] ? system_call_fastpath+0x1a/0x1f

Ran into this on Ubuntu 14.04 and Debian jessie w/ kernel 3.16.x.

Docker command:

docker run -t -i -v /data/sitespeed.io:/sitespeed.io/results company/dockerfiles:sitespeed.io-latest --name "Superbrowse"

This seems like a pretty bad issue...

@jbalonso even with 3.13.0-32-generic I get the error after only a few successful runs :sob:

@MrMMorris could you share a reproducer script using public available images?

Everyone who's seeing this error on their system is running a package of the Linux kernel on their distribution that's far too old and lacks the fixes for this particular problem.

If you run into this problem, make sure you run apt-get update && apt-get dist-upgrade -y and reboot your system. If you're on Digital Ocean, you also need to select the kernel version which was just installed during the update because they don't use the latest kernel automatically (see https://digitalocean.uservoice.com/forums/136585-digitalocean/suggestions/2814988-give-option-to-use-the-droplet-s-own-bootloader).

CentOS/RHEL/Fedora/Scientific Linux users need to keep their systems updated using yum update and reboot after installing the updates.

When reporting this problem, please make sure your system is fully patched and up to date with the latest stable updates (no manually installed experimental/testing/alpha/beta/rc packages) provided by your distribution's vendor.

@unclejack

I ran apt-get update && apt-get dist-upgrade -y

ubuntu 14.04 3.13.0-46-generic

Still get the error after only one docker run

I can create an AMI for reproducing if needed

@MrMMorris Thank you for confirming it's still a problem with the latest kernel package on Ubuntu 14.04.

Anything else I can do to help, let me know! :smile:

@MrMMorris if you can provide a reproducer there is a bug opened for Ubuntu and it will be much appreciated: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152

@rsampaio if I have time today, I will definitely get that for you!

This problem also appears on 3.16(.7) on both Debian 7 and Debian 8: https://github.com/docker/docker/issues/9605#issuecomment-85025729. Rebooting the server is the only way to fix this for now.

Seeing this issue on RHEL 6.6 with kernel 2.6.32-504.8.1.el6.x86_64 when starting some docker containers (not all containers)
_kernel:unregister_netdevice: waiting for lo to become free. Usage count = -1_

Again, rebooting the server seems to be the only solution at this time

Also seeing this on CoreOS (647.0.0) with kernel 3.19.3.

Rebooting is also the only solution I have found.

Tested Debian jessie with sid's kernel (4.0.2) - the problem remains.

Anyone seeing this issue running non-ubuntu containers?

Yes. Debian ones.
19 июня 2015 г. 19:01 пользователь "popsikle" [email protected]
написал:

Anyone seeing this issue running non-ubuntu containers?


Reply to this email directly or view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-113556862.

This is a kernel issue, not an image related issue. Switching an image for another won't improve or make this problem worse.

Experiencing issue on Debian Jessie on a BeagleBone Black running 4.1.2-bone12 kernel

Experiencing after switching from 4.1.2 to 4.2-rc2 (using git build of 1.8.0).
Deleting /var/lib/docker/* doesn't solve the problem.
Switching back to 4.1.2 solves the problem.

Also, VirtualBox has same issue and there's patch for v5.0.0 (retro-ported to v4) which supposedly does something in kernel driver part.. worth looking to understand the problem.

This is the fix in the VirtualBox: https://www.virtualbox.org/attachment/ticket/12264/diff_unregister_netdev
They don't actually modify the kernel, just their kernel module.

Also having this issue with 4.2-rc2:

unregister_netdevice: waiting for vethf1738d3 to become free. Usage count = 1

Just compiled 4.2-RC3, seems to work again

@nazar-pc Thanks for info. Just hit it with 4.1.3, was pretty upset
@techniq same here, pretty bad kernel bug. I wonder if we should report it to be backported to 4.1 tree.

Linux docker13 3.19.0-22-generic #22-Ubuntu SMP Tue Jun 16 17:15:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Kernel from Ubuntu 15.04, same issue

I saw it with 4.2-rc3 as well. There is not one bug about device leakage :) I can reproduce on any kernel >=4.1 under highload.

I just had this problem too. Ubuntu 3.13.0-57-generic, provisioned via tutum. Unfortunately it fills up the kern.log and syslog and crashes the machine. It happens on the database machine (dockerized postgres), so it brings down the whole system...

Joining the chorus of "me too"'s, I am seeing this problem on a cloudstack VM running RancherOS (a minimal OS) 0.3.3 while pulling docker images from a local private docker repo. It's happening every ten seconds, not sure if that means anything or not.

Also having this issue with 4.2-rc7

Any news on this, which kernel should we use ? It keeps happening even with a fully up-to-date kernel (3.19.0-26 on Ubuntu 14.04)

We got this problem too. This happen after we configured userland-proxy=false. We're using some monitor scripts that will spawn new docker container to execute nagios plugins command every 1 minutes. What I'm seeing on process tree is it stuck on docker rm command and seeing a lot of errors in kern.log file

Sep 24 03:53:13 prod-service-05 kernel: [ 1920.544106] unregister_netdevice: waiting for lo to become free. Usage count = 2
Sep 24 03:53:13 prod-service-05 kernel: [ 1921.008076] unregister_netdevice: waiting for vethb6bf4db to become free. Usage count = 1
Sep 24 03:53:23 prod-service-05 kernel: [ 1930.676078] unregister_netdevice: waiting for lo to become free. Usage count = 2
Sep 24 03:53:23 prod-service-05 kernel: [ 1931.140074] unregister_netdevice: waiting for vethb6bf4db to become free. Usage count = 1
Sep 24 03:53:33 prod-service-05 kernel: [ 1940.820078] unregister_netdevice: waiting for lo to become free. Usage count = 2

This is our system information

ubuntu@prod-service-02:~$ docker version
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64
ubuntu@prod-service-02:~$ docker info
Containers: 2
Images: 52
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: gelf
Kernel Version: 4.0.9-040009-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 4
Total Memory: 7.304 GiB
Name: prod-service-02
ID: NOIK:LVBV:HFB4:GZ2Y:Q74F:Q4WW:ZE22:MDE7:7EBW:XS42:ZK4G:XNTB
WARNING: No swap limit support
Labels:
 provider=generic

Update: Although https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 said it's already fixed on 2015-08-17. So I tried with kernel v3.19.8-ckt6-vivid that build on 02-Sep-2015 or even v4.2.1-unstable that build on 21-Sep-2015 and still have a problem.

I've just hit the problem again using 3.19.0-28-generic, so latest ubuntu kernel is not safe

Yup, seems like --userland-proxy=false isn't best option now with older kernels :(

No. I tried --userland-proxy=false for all 3.19, 4.0, 4.2 kernel version and problem still happen.

I am using userland proxy without iptables (--iptables=false) and seeing this once per day as a minimum. Sadly the only workaround was a watchdog that hard reset the server using SysRq technique.

My systems run some containers that are heavy stdout/err writers, as others reported it may trigger the bug.

``````
$ docker info
Containers: 15
Images: 148
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 178
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-26-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 12
Total Memory: 62.89 GiB
Name: **
ID: 2ALJ:YTUH:QCNX:FPEO:YBG4:ZTL4:2EYK:AV7D:FN7C:IVNU:UWBL:YYZ5

$ docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64```
``````

Unfortunately, I'm in the same case, today a production server failed 3 times on this error, and the only way to handle that is to use some magic SysRq commands..

bump

I'm still seeing this on the latest debian jessie using kernel 4.2.0

Same problem here. All of a sudden, three of my aws servers went down and the logs were yelling "unregister_netdevice: waiting for lo to become free. Usage count = 1"

Ubuntu: 14.04
Kernel version: 3.13.0-63-generic
Docker: 1.7.1

Syslog
screenshot from 2015-10-22 11 53 41

Is there a safe-to-use kernel version ?

Issue happens also with kernel 4.2 of Ubuntu 15.10

happend in coreos:

Images: 1174
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.7-coreos
Operating System: CoreOS 766.4.0

@killme2008 上次说的内核bug

You probably should give a try with this patch applied on top of your kernel http://www.spinics.net/lists/netdev/msg351337.html
packet: race condition in packet_bind

or wait for the backport in -stable tree; it will come sooner or later.

:+1: Great news!

Hey everyone, good news !

Since my last comment here (at the time of writing, 17 days ago) I haven't got these errors again. My servers (about 30 of them) were running ubuntu 14.04 with some outdated packages.

After a full system upgrade including docker-engine (from 1.7.1 to 1.8.3) + kernel upgrade to the latest possible version on ubuntu's repo, my servers are running without any occurences.

:8ball:

Happened on 3 of our AWS instance today also:

Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64
Containers: 45
Images: 423
Storage Driver: devicemapper
 Pool Name: docker-202:1-527948-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 22.79 GB
 Data Space Total: 107.4 GB
 Data Space Available: 84.58 GB
 Metadata Space Used: 35.58 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.112 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-49-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 60 GiB
Name: ip-10-0-1-36
ID: HEZG:TBTM:V4LN:IU7U:P55N:HNVH:XXOP:RMUX:JNWH:DSJP:3OA4:MGO5
WARNING: No swap limit support

I'm having the same problem with Ubuntu 14.04, all packages up-to-date and latest linux-generic-lts-vivid kernel:

$ docker version
Client:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 17:43:42 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 17:43:42 UTC 2015
 OS/Arch:      linux/amd64
$ docker info
Containers: 14
Images: 123
Server Version: 1.9.0
Storage Driver: aufs
 Root Dir: /mnt/docker-images/aufs
 Backing Filesystem: extfs
 Dirs: 151
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-32-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 8
Total Memory: 29.45 GiB
Name: ip-172-31-35-202
ID: 3B7E:5DJL:S4IB:KUCL:6UKN:64OF:WCLO:JKGK:4OI2:I2R6:63EY:WATN
WARNING: No swap limit support

I had it with latest linux-image-generic (3.13.0-67-generic) as well.

Having the same issues here on rancherOS.

Still happening on Fedora 22 (updated)....
I can get rid of the messages if I restart docker

systemctl restart docker
... the message appears again for about 3-4 times and then stops

The same error meet me with coreos:

version of coreos:

core@core-1-94 ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=766.5.0
VERSION_ID=766.5.0
BUILD_ID=
PRETTY_NAME="CoreOS 766.5.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

docker version:

core@core-1-94 ~ $ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): df2f73d-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): df2f73d-dirty
OS/Arch (server): linux/amd64
core@core-1-94 ~ $ uname -a
Linux core-1-94 4.1.7-coreos-r1 #2 SMP Thu Nov 5 02:10:23 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz GenuineIntel GNU/Linux

system log:

Dec 07 16:26:54 core-1-94 kernel: unregister_netdevice: waiting for veth775ea53 to become free. Usage count = 1
Dec 07 16:26:54 core-1-94 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 07 16:26:55 core-1-94 sdnotify-proxy[1203]: I1207 08:26:55.930559 00001 vxlan.go:340] Ignoring not a miss: 4e:5c:47:2f:9a:85, 10.244.97.10
Dec 07 16:26:59 core-1-94 dockerd[1269]: time="2015-12-07T16:26:59.448438648+08:00" level=info msg="GET /version"
Dec 07 16:27:01 core-1-94 sdnotify-proxy[1203]: I1207 08:27:01.050588 00001 vxlan.go:340] Ignoring not a miss: 5a:b1:f7:e9:7d:d0, 10.244.34.8
Dec 07 16:27:02 core-1-94 dockerd[1269]: time="2015-12-07T16:27:02.398020120+08:00" level=info msg="GET /version"
Dec 07 16:27:02 core-1-94 dockerd[1269]: time="2015-12-07T16:27:02.398316249+08:00" level=info msg="GET /version"
Dec 07 16:27:04 core-1-94 dockerd[1269]: time="2015-12-07T16:27:04.449317389+08:00" level=info msg="GET /version"
Dec 07 16:27:04 core-1-94 kernel: unregister_netdevice: waiting for veth775ea53 to become free. Usage count = 1
Dec 07 16:27:04 core-1-94 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 07 16:27:06 core-1-94 sdnotify-proxy[1203]: I1207 08:27:06.106573 00001 vxlan.go:340] Ignoring not a miss: a6:38:ac:79:93:f5, 10.244.47.24
Dec 07 16:27:09 core-1-94 dockerd[1269]: time="2015-12-07T16:27:09.449944048+08:00" level=info msg="GET /version"
Dec 07 16:27:11 core-1-94 sdnotify-proxy[1203]: I1207 08:27:11.162578 00001 vxlan.go:340] Ignoring not a miss: 0e:f0:6f:f4:69:57, 10.244.71.24
Dec 07 16:27:12 core-1-94 dockerd[1269]: time="2015-12-07T16:27:12.502991197+08:00" level=info msg="GET /version"
Dec 07 16:27:12 core-1-94 dockerd[1269]: time="2015-12-07T16:27:12.503411160+08:00" level=info msg="GET /version"
Dec 07 16:27:14 core-1-94 dockerd[1269]: time="2015-12-07T16:27:14.450646841+08:00" level=info msg="GET /version"
Dec 07 16:27:14 core-1-94 kernel: unregister_netdevice: waiting for veth775ea53 to become free. Usage count = 1
Dec 07 16:27:14 core-1-94 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 07 16:27:16 core-1-94 sdnotify-proxy[1203]: I1207 08:27:16.282556 00001 vxlan.go:340] Ignoring not a miss: a6:62:77:31:ef:68, 10.244.13.6
Dec 07 16:27:19 core-1-94 dockerd[1269]: time="2015-12-07T16:27:19.451486277+08:00" level=info msg="GET /version"
Dec 07 16:27:21 core-1-94 sdnotify-proxy[1203]: I1207 08:27:21.402559 00001 vxlan.go:340] Ignoring not a miss: 92:c4:66:52:cd:bb, 10.244.24.7
Dec 07 16:27:22 core-1-94 dockerd[1269]: time="2015-12-07T16:27:22.575446889+08:00" level=info msg="GET /version"
Dec 07 16:27:22 core-1-94 dockerd[1269]: time="2015-12-07T16:27:22.575838302+08:00" level=info msg="GET /version"
Dec 07 16:27:24 core-1-94 dockerd[1269]: time="2015-12-07T16:27:24.452320364+08:00" level=info msg="GET /version"
Dec 07 16:27:24 core-1-94 kernel: unregister_netdevice: waiting for veth775ea53 to become free. Usage count = 1
Dec 07 16:27:24 core-1-94 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 2
Dec 07 16:27:26 core-1-94 sdnotify-proxy[1203]: I1207 08:27:26.394569 00001 vxlan.go:340] Ignoring not a miss: 6a:f7:bf:ec:03:50, 10.244.87.8
Dec 07 16:27:29 core-1-94 dockerd[1269]: time="2015-12-07T16:27:29.453171649+08:00" level=info msg="GET /version"
Dec 07 16:27:29 core-1-94 systemd[1]: Starting Generate /run/coreos/motd...
Dec 07 16:27:29 core-1-94 systemd[1]: Started Generate /run/coreos/motd.
Dec 07 16:27:32 core-1-94 dockerd[1269]: time="2015-12-07T16:27:32.671592437+08:00" level=info msg="GET /version"
Dec 07 16:27:32 core-1-94 dockerd[1269]: time="2015-12-07T16:27:32.671841436+08:00" level=info msg="GET /version"
Dec 07 16:27:33 core-1-94 sdnotify-proxy[1203]: I1207 08:27:33.562534 00001 vxlan.go:340] Ignoring not a miss: 22:b4:62:d6:25:b9, 10.244.68.8
Dec 07 16:27:34 core-1-94 dockerd[1269]: time="2015-12-07T16:27:34.453953162+08:00" level=info msg="GET /version"
Dec 07 16:27:34 core-1-94 kernel: unregister_netdevice: waiting for veth775ea53 to become free. Usage count = 1
Dec 07 16:27:35 core-1-94 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 2

happy birthday, bloody issue =)
6 May 2014

same thing here. Just rebooting. Latest docker version. Ubuntu 14.04.

@samvignoli this has been identified as a kernel issue, so unfortunately not something that can be fixed in docker

@thaJeztah Have you got a link to the bug tracker for the kernel issue?
Or perhaps a pointer to which kernel's are affected?

Keen to get this resolved in our environment.

@Rucknar sorry, I don't (perhaps there's one in this discussion, I haven't read back all comments)

Linux atlas2 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

@Rucknar if you scroll a bit to the top - you will see the link to the patch http://www.spinics.net/lists/netdev/msg351337.html. It is now in linux master, I guess it will go to Linux 4.4, maybe someone had already backported it to previous versions, but not sure.

Thanks all, will look at what's required in upgrading.

FWIW I backported the last patch mentioned here to ubuntu 3.19 and I also tested on 4.2 kernel both unsuccessful. The problem is still present even on 4.4-rc3 net-next branch at this point.

@rsampaio How did you test that? I cannot reliably trigger this fault using docker, actually, on any kernel. It just happens sometimes.

@fxposter we also can't reproduce the problem outside production so I had to boot a few instances with the patched kernel in production, it happens so frequently that I can find out if a kernel is affected within 24h of production load.

Sometimes we an fix it with an very unusual resource: we move the container directories away from /var/lib/docker/aufs/mnt

With that... MAYBE we can 'service docker restart' and move the directories back.

Otherwise... only rebooting.

@rsampaio are you talking about heroku production now? how dow you avoid this problem, cause all your business is built around containers/etc?

@rsampaio do you use --userland-proxy=false or just high amount of created containers? I can reproduce it fairly easy with --userland-proxy=false and with some load without :)

@LK4D4 I believe it is just a high amount of created/destroyed containers, specially containers doing a lot of outbound traffic, we also use LXC instead of docker but the bug is exactly the same as the one described here, I can try to reproduce using your method if it is easy to describe and/or does not involve production load, the idea is to get a crashdump and _maybe_ find more hints about what exactly trigger this bug.

@rsampaio I can reproduce with prolonged usage of https://github.com/crosbymichael/docker-stress

Has there been any updates / proposals for this getting fixed?

@joshrendek it's a kernel bug. Looks like even newly released kernel 4.4 does not fix it, so there is at least one more race condition somewhere :)

kernel bug
image

=)

@samvignoli could you keep your comments constructive? Feel free to open a PR if you have ways to fix this issue.

Was this bug already reported upstream (kernel mailinglist)?

Sure has been. first comment references this bug as well: https://bugzilla.kernel.org/show_bug.cgi?id=81211

Open since 2014. No comments from anyone that works on it though other than to say it's most likely an application using it incorrectly.

Thanks for the link, Justin! I'll troll Linus =)

kind regards. =* :heart:

@samvignoli please don't do this, it does help not anyone.
Can somebody reproduce this in a small VM image?
Maybe I can get my hands dirty with gdb and lots of kprintf.

bug still open.

OS: CentOS 7.2
kernel: 4.4.2 elrepo kernel-ml
docker: 1.10.2
fs: overlayfs with xfs

log:

Message from syslogd@host118 at Feb 29 14:52:47 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
[root@host118 ~]# uname -a
Linux host118 4.4.2-1.el7.elrepo.x86_64 #1 SMP Thu Feb 18 10:20:19 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@host118 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@host118 ~]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:    7.2.1511
Codename:   Core
[root@host118 ~]# docker info
Containers: 5
 Running: 2
 Paused: 0
 Stopped: 3
Images: 154
Server Version: 1.10.2
Storage Driver: overlay
 Backing Filesystem: xfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.2-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.858 GiB
Name: host118
ID: 2NW7:Y54E:AHTO:AVDR:S2XZ:BGMC:ZO4I:BCAG:6RKW:KITO:KRM2:DQIZ
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

this log show when runing sameersbn/docker-gitlab docker image:

wget https://raw.githubusercontent.com/sameersbn/docker-gitlab/master/docker-compose.yml
docker-compose up

I may just be getting lucky - but after applying these sysctl settings the occurrence of this happening has gone way down.

net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 600
net.ipv4.tcp_tw_reuse = 1
net.netfilter.nf_conntrack_generic_timeout = 120
net.netfilter.nf_conntrack_max = 1555600000
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 300
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

@joshrendek what's the motivation behind these settings?

@kmike this was to fix some other conntrack issues (ip tables getting full) that we were experiencing - it seems to have done something with regards to my original issue though as a side effect

Could you show the before/after so we can see what actually changed? are you willing to binary search these settings and see if there's a smaller set?

I'm using CoreOS Stable (899.13.0) in a Compute Engine VM. This error occurs every time I start the server with the following flag to 0 (the default). I have tested several times back and forth and with IPv6 disabled I can start all the containers in the node without any error:

$ cat /etc/sysctl.d/10-disable-ipv6.conf 
net.ipv6.conf.all.disable_ipv6 = 1

I use the gcloud container to download from GCR, so maybe the problem is IPv6 + download of MBs of images + close the containers quickly.

Docker version for reference:

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   9894698
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   9894698
 Built:        
 OS/Arch:      linux/amd64

I have also tested the previous sysctl flags in this issue; but some have already that value and the rest didn't seem to change anything related to this error:

net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 600
  -----> not found in CoreOS
net.ipv4.tcp_tw_reuse = 1
  -----> default: 0
net.netfilter.nf_conntrack_generic_timeout = 120
  -----> default: 600
net.netfilter.nf_conntrack_max = 1555600000
  -----> default: 65536
net.netfilter.nf_conntrack_tcp_timeout_close = 10
  -> already: 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
  -> already: 60
net.netfilter.nf_conntrack_tcp_timeout_established = 300
  -----> default: 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
  -> already: 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
  -> already: 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
  -> already: 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
  -> already: 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
  -> already: 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
  -> already: 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
  -> already: 300

I'm still seeing the issue when I set net.ipv6.conf.all.disable_ipv6=1.

The docker stress tool can produce the issue very easily.
https://github.com/crosbymichael/docker-stress

This is the binary I built for the tool above.
https://storage.googleapis.com/donny/main
https://storage.googleapis.com/donny/stress.json

Once we see the log "unregister_netdevice: waiting for veth6c3b8b0 to become free. Usage count", docker is hanging. I think this is a kernel issue triggered by docker. This will happen only when docker userland-proxy is off (--userland-proxy=false).

I've had this happen with and without userland proxy enabled, so I wouldn't say only when it is off.

It could be that it makes the situation worse; I know we once tried to make --userland-proxy=false the default, but reverted that because there were side-effects https://github.com/docker/docker/issues/14856

I've seen too the error one time since yesterday, clearly disabling IPv6 it's not a fix; but without the flag I can't even start all the containers of the server without trashing docker.

Running into this on CoreOS 1010.1.0 with kubernetes 1.2.2 and docker 1.10.3

Kubernetes added a flag to kubelet (on mobile, so can't look it up) for
hairpin mode. Change it to "promiscuous bridge" or whatever the valid
value is. We have not seen this error since making that change.

@bprashanh

Please confirm or refute?
On Apr 13, 2016 12:43 PM, "Aaron Crickenberger" [email protected]
wrote:

Running into this on CoreOS 1010.1.0 with kubernetes 1.2.2 and docker
1.10.3


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-209617342

Getting this on AWS running Linux 4.4.5-15.26.amzn1.x86_64 with Docker version 1.9.1, build a34a1d5/1.9.1.

Ruby 2.3.0 with Alpine image is running inside the container causing this

kernel:[58551.548114] unregister_netdevice: waiting for lo to become free. Usage count = 1

Any fix for this?

saw this for the first time on Linux 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:38:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

A couple reboots fixed it.

@MrMMorris Fixed as in you're certain the problem has gone away for good, or in that you're not experiencing it again just yet? Could be a race condition...

It's pretty clear that this is a race in the kernel, losing a refcount
somewhere. This is a REALLY hard to track bug, but as far as we can tell
still exists.

On Mon, May 2, 2016 at 10:52 PM, Sune Keller [email protected]
wrote:

@MrMMorris https://github.com/MrMMorris Fixed as in you're certain the
problem has gone away for good, or in that you're not experiencing it again
just yet? Could be a race condition...


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-216444133

Yup. I tried CoreOS 1032.0.0 with kernel 4.5, and the issue still exists.

I encountered this again on CoreOS 1010.1.0 with kernel 4.5.0 yesterday, it had been after several containers were started and killed in rapid succession.

I've got this error.

Docker Version: 1.9.1
Kernel Version: 4.4.8-20.46.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.03

@sirlatrom not fixed. Seeing this again 😭 Required multiple reboots to resolve.

Currently running 3.19.0-18-generic. Will try upgrading to latest

same here! :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry: :cry:

@samvignoli your comments are not constructive. Please stop posting.

sorry, forgot the thumbs up function.

Reproduced in Fedora Server 23 - 4.2.5-300.fc23.x86_64. Cannot restart the Docker service - only reboot the node.

Same issue on Fedora 24 Kernel: 4.5.2-302.fc24.x86_64. didn't cause any hangs, but spams a log file.

@hapylestat Can you try systemctl restart docker? This caused it all to hang for me.

Thanks

This is happening to my (CoreOS, EC2) machines quite frequently. In case it's at all helpful, here are all the logs related to the stuck veth device in one instance of this bug.

$ journalctl | grep veth96110d9
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal systemd-udevd[4189]: Could not generate persistent MAC address for veth96110d9: No such file or directory
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal kernel: IPv6: ADDRCONF(NETDEV_UP): veth96110d9: link is not ready
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Configured
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth96110d9: link becomes ready
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Gained carrier
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Lost carrier
May 14 16:40:27 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Removing non-existent address: fe80::98f4:98ff:fea2:d83b/64 (valid for ever)
May 14 16:40:32 ip-10-100-37-14.eu-west-1.compute.internal kernel: eth0: renamed from veth96110d9
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal kernel: veth96110d9: renamed from eth0
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Configured
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Gained carrier
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal kernel: IPv6: veth96110d9: IPv6 duplicate address fe80::42:aff:fee0:571a detected!
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Lost carrier
May 14 16:53:45 ip-10-100-37-14.eu-west-1.compute.internal systemd-networkd[665]: veth96110d9: Removing non-existent address: fe80::42:aff:fee0:571a/64 (valid for ever)
May 14 16:53:55 ip-10-100-37-14.eu-west-1.compute.internal kernel: unregister_netdevice: waiting for veth96110d9 to become free. Usage count = 1
May 14 16:54:05 ip-10-100-37-14.eu-west-1.compute.internal kernel: unregister_netdevice: waiting for veth96110d9 to become free. Usage count = 1
May 14 16:54:15 ip-10-100-37-14.eu-west-1.compute.internal kernel: unregister_netdevice: waiting for veth96110d9 to become free. Usage count = 1
May 14 16:54:25 ip-10-100-37-14.eu-west-1.compute.internal kernel: unregister_netdevice: waiting for veth96110d9 to become free. Usage count = 1
May 14 16:54:35 ip-10-100-37-14.eu-west-1.compute.internal kernel: unregister_netdevice: waiting for veth96110d9 to become free. Usage count = 1

This seems to happen when I remove many containers at once (in my case, when I delete k8s pods en-masse).

For those saying a reboot fixed it - did you reboot or stop/start the machines? On physical machines I had to use a remote power reset to get the machine to come back up.

@joshrendek, I had to use iLO's cold boot (I.e a physical power cycle).

@joshrendek I have a script now which runs watching for this and does reboot -f when it happens 😢.

Might have found the issue (or just got lucky). I have moved the Docker graph dir from an XFS partitioned disk over to an EXT4 partitioned disk and I cannot reproduce the issue (as well as solving a load of other XFS bugs I was getting). I remember @vbatts saying that XFS isn't supported yet.

I have tried to provoke by running build, run, stop, delete in an infinate loop on variaous images, creating about 10 containers each cycle, for the last few hours.

@joedborg what graphdriver are you using? Devicemapper? Overlay?

@thaJeztah Good point, I should have mentioned that. I'm using Overlay driver with (now) EXT4 backing FS.

I used to use devicemapper (because I'm using Fedora Server), but I had tons of pain (as I believe many do), especially with leaks where the mapper would not return space to the pool once a container had been deleted.

If it helps, I'm on Docker 1.11.1 and Kernel 4.2.5-300.fc23.x86_64.

@joedborg interesting, because the RHEL docs mentioned that only EXT4 is supported on RHEL/CentOS 7.1, and only XFS on RHEL/CentOS 7.2. I'd have expected XFS to work on newer versions then

@thaJeztah ah that's odd. I'm trying to think of other things that it might be. I've re-read from the top and it seems some people are running the same config. The only other thing that's different is that the XFS disk is spindle and the EXT4 is SSD. I will keep soak testing in the mean time. I've also moved prod over to used the same setup, so either way we'll have an answer before long. However, it was doing on almost every stop before, so it's certainly better.

@joedborg well, it's useful information indeed

same error here, from kernel 4.2 to 4.5, same docker verion.

BTW, I'm running several virtualbox machines on the same box at the same time.

$ docker version
Client:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:27:08 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:27:08 UTC 2015
 OS/Arch:      linux/amd64
$ docker info
Containers: 3
Images: 461
Storage Driver: devicemapper
 Pool Name: docker-253:7-1310721-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 18.08 GB
 Data Space Total: 107.4 GB
 Data Space Available: 18.37 GB
 Metadata Space Used: 26.8 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.121 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.90 (2014-09-01)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.5.0-0.bpo.1-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 4
Total Memory: 15.56 GiB
Name: tungsten
ID: HJX5:TKIH:TF4G:JCQA:MHQB:YYUD:DHBL:53M7:ZRY2:OCIE:FHY7:NLP6

I am experiencing this issue using the overlay graph driver, with the directory on an ext4 FS. So I don't think xfs is the problem 😢

@obeattie Yeah, seems people are getting it on devicemapper too. Touch wood, I have not had the issue again since switching. As mentioned, I did also swap physical disk. This is going to be an interesing one!

This problem does not correlate with the filesystem in any way. I have see this problem with zfs, overlayfs, devicemapper, btrfs and aufs. Also with or without swap. It is not even limited to docker, I hit the same bug with lxc too. The only workaround, I currently see, is not to stop container concurrently.

if it helps, I am getting the same error message on the latest ec2 instance backed by AWS AMI. docker version shows-

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5/1.9.1
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5/1.9.1
 Built:
 OS/Arch:      linux/amd64

Just hopping on board. I'm seeing the same behavior on the latest Amazon ec2 instance. After some period of time, the container just tips over and becomes unresponsive.

$ docker info
Containers: 2
Images: 31
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-202:1-263705-pool
Pool Blocksize: 65.54 kB
Base Device Size: 107.4 GB
Backing Filesystem:
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.199 GB
Data Space Total: 107.4 GB
Data Space Available: 5.754 GB
Metadata Space Used: 2.335 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.145 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.4.10-22.54.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.03
CPUs: 1
Total Memory: 995.4 MiB
Name: [redacted]
ID: OB7A:Q6RX:ZRMK:4R5H:ZUQY:BBNK:BJNN:OWKS:FNU4:7NI2:AKRT:5SEP

$ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

Same as the above comments, also running on EC2 happens to be via elastic beanstalk using 64bit Amazon Linux 2016.03 v2.1.0 running Docker 1.9.1

Somewhat anecdotal at this time, but I recently tried upgrading from 4.2.0 to 4.5.5 kernel on around 18 servers as a tests, and this issue became considerably worse (from multiple days down to no more than 4 hours between issues).

This was on Debian 8

Exact same setup as @jonpaul and @g0ddard

Looking to see how we might be able to mitigate this bug.
First thing (which may or may not work out, it's risky) is to keep the API available in cases where this occurs: #23178

Hello. I've also been bitten by this bug...

Jun 08 17:30:40 node-0-vm kernel: unregister_netdevice: waiting for veth846b1dc to become free. Usage count = 1

I'm using Kubernetes 1.2.4 on CoreOS Beta, Flannel and running on Azure. Is there someway to help debug this issue? The kernel bug thread seems dead. Some people report that disabling IPv6 on the kernel, using --userland-proxy=true or using aufs instead of overlay storage help, while others do not... It's a bit confusing.

Like @justin8 I also noticed this after upgrading my Fedora 23 system to kernel 4.5.5; the issue remains with kernel 4.5.6.

We encountered this bug when the container was hitting its memory limit. Unsure if its related or not.

same issue here

# docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64
 # docker info
Containers: 213
Images: 1232
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1667
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-5-exton
Operating System: Debian GNU/Linux 7 (wheezy)
CPUs: 4
Total Memory: 21.58 GiB
Name: [redacted]
Message from syslogd@[redacted] at Jun 24 10:07:54 ...
 kernel:[1716405.486669] unregister_netdevice: waiting for lo to become free. Usage count = 2

Message from syslogd@[redacted] at Jun 24 10:07:56 ...
 kernel:[1716407.146691] unregister_netdevice: waiting for veth06216c2 to become free. Usage count = 1

centos7.2
docker 1.10.3
the same problem

I have a "one liner" that will eventually reproduce this issue for me on an EC2 (m4.large) running CoreOS 1068.3.0 with the 4.6.3 kernel (so very recent). For me, it takes about 300 iterations but YMMV.

Linux ip-172-31-58-11.ec2.internal 4.6.3-coreos #2 SMP Sat Jun 25 00:59:14 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux
CoreOS beta (1068.3.0)
Docker version 1.10.3, build 3cd164c

A few hundred iterations of the loop here will eventually hang dockerd and the kernel will be emitting error messages like

kernel: unregister_netdevice: waiting for veth8c7d525 to become free. Usage count = 1

The reproducer loop is

i=0; while echo $i && docker run --rm -p 8080 busybox /bin/true && docker ps; do sleep 0.05; ((i+=1)); done

EDITS

  • I've only been able to reproduce with this when userland-proxy=false
  • I've NOT been able to reproduce using VirtualBox (only ec2 so far) so maybe it's related to hypervisor too

@btalbot's script, above, doesn't reproduce the issue for me on Fedora 23 after several thousand iterations.

$ docker --version
Docker version 1.10.3, build f476348/1.10.3
$ docker info
Containers: 3
 Running: 0
 Paused: 0
 Stopped: 3
Images: 42
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker_vg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 17.69 GB
 Data Space Total: 73.67 GB
 Data Space Available: 55.99 GB
 Metadata Space Used: 5.329 MB
 Metadata Space Total: 130 MB
 Metadata Space Available: 124.7 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.109 (2015-09-22)
Execution Driver: native-0.2
Logging Driver: journald
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.5.7-200.fc23.x86_64
Operating System: Fedora 23 (Workstation Edition)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 0
CPUs: 4
Total Memory: 15.56 GiB
Name: <hostname>
ID: TOKW:AWJF:3VZU:55QA:V3KD:ZCA6:4XWW:JBY2:2Q5C:3S65:3ZXV:XRXG
Registries: docker.io (secure)

This problem happens quite frequently on my Kubernetes cluster, however I can't reproduce it reliably with the stressers or @btalbot's one liner. I've tried running it on two Azure VMs with CoreOS 1068.3.0.

First VM was a Standard_D1_v2 (3.5GB Ram, 1 core) - the script did > 3000 iterations.
Second VM was a Standard_DS15_v2 (140GB Ram, 20 cores) - the script did > 7600 iterations.

I've updated my previous comment (https://github.com/docker/docker/issues/5618#issuecomment-229545933) to include that I can only reproduce this when userland-proxy=false.

It reproduces for me on EC2 t2.micro (single core) VMs as well as m4.large (multi core) both using HVM. Haven't seen it happen using VirtualBox on my laptop yet though no matter the setting of userland-proxy.

We have encountered this bug while using Flannel with hairpin-veth enabled at kubernetes cluster (using iptables proxy). This bug was happening only when we run-stop too many container. We switch to using cbr0 bridge network and promiscuous-bridge hairpin mode and never see it again.
Actually it is easy to reproduce this bug if you are using hairpin-veth, just start this job with 100 containers with kubernetes.

On 01/07/2016 08:01, manoj0077 wrote:

@btalbot https://github.com/btalbot so with 1.12 we can restart
dockerd without effecting running containers. So would dockerd restart
help here in this case ?

AFAICT, event with 1.12, docker containers processes are still children
of the docker daemon.

@sercand how did you set promiscuous-bridge hairpin mode? I can't see any documentation from docker about that, or perhaps they are using a different name

Is there some official word from Docker 🐳 on when this might be looked at? This is second most commented open issue; is very severe (necessitating a host restart); is reproducible; and I don't see any real progress toward pinning down the root cause or fixing it 😞.

This seems most likely to be a kernel issue, but the ticket on Bugzilla has been stagnant for months. Would it be helpful to post our test cases there?

@justin8 I think those are Kubelet flags: --configure-cbr0 and --hairpin-mode

@sercand I also use Flannel. Is there any disadvantage in using --hairpin-mode=promiscuous-bridge?

@obeattie I agree. :(

FTR I managed to replicate the problem using @sercand's stresser job on a test Kubernetes cluster that I set up, it also uses flannel and hairpin-veth.

@sercand Could you please detail the steps to begin using promiscuous-bridge? I added the flag --configure-cbr0=true to the node's kubelet but it complains:
ConfigureCBR0 requested, but PodCIDR not set. Will not configure CBR0 right now. I thought this PodCIDR was supposed to come from the master? Thanks.

EDIT: It seems I needed to add --allocate-node-cidrs=true --cluster-cidr=10.2.0.0/16 to the controller manager config, but since I don't have a cloud provider (Azure) the routes probably won't work.

@justin8 I have followed this doc.
@edevil from the documentation hairpin-mode is for "This allows endpoints of a Service to loadbalance back to themselves if they should try to access their own Service". By the way my cluster runs at Azure and it was not an easy task to achieve.

@sercand According to the doc, if we use --allocate-node-cidrs=true on the controller manager we're supposed to use a cloud provider in order for it to setup the routes. Since there is no Kubernetes cloud provider for Azure didn't you have problems? Do you setup the routes manually? Thanks.

@edevil I use terraform to create routes. You can find it at this repo. I have quickly created this configuration and tested only once. I hope it is enough to provide basic logic behind it.

@morvans @btalbot did you get a chance to try with 1.12 ...?

I can confirm that moving away from hairpin-veth and using the cbr0 bridge I cannot reproduce the problem anymore.

Just in case: anyone having this issue on the bare metal? We've seen this when tested rancher cluster on our VMWare lab, but never seen on real bare metal deployment.

Yes, this issue happens on bare metal for any kernel >= 4.3. Have seen this on a lot of different machines and hardware configurations. Only solution for us was to use kernel 4.2.

It definitely still happens on 4.2, but it ias an order of magnitude more often on anything newer, ive been testing each major release to see if its better, and nothing yet

Happens on CoreOS alpha 1097.0.0 also.

Kernel: 4.6.3
Docker: 1.11.2

I get same issue.

Docker: 1.11.2
Kernel: 4.4.8-boot2docker.

Host: Docker-machine with VMWare Fusion driver on OS X.

Any suggested workarounds?

Would be really helpful if those of you who can reproduce the issue reliably in an environment where a crashdump is possible (aka not EC2) could in fact share this crashdump file, more information about how to enable kdump in ubuntu trusty can be found here and these are the crash options you need to enable when kdump is ready to generate a crashdump:

echo 1 > /proc/sys/kernel/hung_task_panic          # panic when hung task is detected
echo 1 > /proc/sys/kernel/panic_on_io_nmi          # panic on NMIs from I/O
echo 1 > /proc/sys/kernel/panic_on_oops            # panic on oops or kernel bug detection
echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi # panic on NMIs from memory or unknown
echo 1 > /proc/sys/kernel/softlockup_panic         # panic when soft lockups are detected
echo 1 > /proc/sys/vm/panic_on_oom                 # panic when out-of-memory happens

The crashdump can really help kernel developers find more about what is causing the reference leak but keep in mind that a crashdump also includes a memory dump of your host and may contain sensible information.

...sensible information.

:o

I am running into the same issue.

Jul 13 10:48:34 kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Linux 4.6.3-1.el7.elrepo.x86_64
Docker: 1.11.2

Same issue:

Ubuntu 14.04.4 LTS (GNU/Linux 3.19.0-25-generic x86_64)
Docker version: 1.10.3

Just happened directly on the terminal screen:

Message from syslogd@svn at Jul 26 21:47:38 ...
 kernel:[492821.492101] unregister_netdevice: waiting for lo to become free. Usage count = 2

Message from syslogd@svn at Jul 26 21:47:48 ...
 kernel:[492831.736107] unregister_netdevice: waiting for lo to become free. Usage count = 2

Message from syslogd@svn at Jul 26 21:47:58 ...
 kernel:[492841.984110] unregister_netdevice: waiting for lo to become free. Usage count = 2

system is

Linux svn.da.com.ar 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Same problem

Os: Amazon Linux AMI release 2016.03
Docker: 1.9.1

Here also:

Linux 4.4.14-24.50.amzn1.x86_64 x86_64
Docker version 1.11.2, build b9f10c9/1.11.2

I'm seeing the same issue on EC2:

Docker version 1.11.2, build b9f10c9/1.11.2
NAME="Amazon Linux AMI"
VERSION="2016.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2016.03"
PRETTY_NAME="Amazon Linux AMI 2016.03"
CPE_NAME="cpe:/o:amazon:linux:2016.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"
 kernel:[154350.108043] unregister_netdevice: waiting for lo to become free. Usage count = 1

(on all my pty + beeper when this happens)
"simply" Debian Jessie + backports:

Linux 4.6.0-0.bpo.1-amd64 #1 SMP Debian 4.6.1-1~bpo8+1 (2016-06-14) x86_64 GNU/Linux
Docker version 1.12.0, build 8eab29e

Hello,

When I try to replicate the issue over a controlled environment by creating an destroying new images I can not reproduce it.

The issue is been raise at one of the servers running docker 1.9.1

 docker info | egrep "Version|Driver"
Server Version: 1.9.1
Storage Driver: devicemapper
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: gelf
Kernel Version: 4.5.0-coreos-r1

I concurrently lunch 17753 container so far in concurrent mode and raising traffic to internet, without leak any of the veth* interface. Can someone paste instructions to consistently reproduce the issue?

@pegerto Should be pretty easy to trigger if you have --userland-proxy=false and spin up a bunch of containers concurrently. I do this using https://github.com/crosbymichael/docker-stress

Thanks @cpuguy83

Configuring the daemon to have --userland-proxy=false I can easily reproduce the issue, thank you, we can see this issue affecting daemons that doesn't run this configuration.

I see a kernel dump at the netfilter hook introduced by the netns segregation at >=4.3, any thoughts why the issue seems worse when route occurs at 127/8 ?

Thanks

Seeing this issue as well. CoreOS 1068.8.0, Docker 1.10.3, kernel 4.6.3. I pulled some of the system logs if anybody is interested.

Just got multiple ...
unregistered_netdevice: waiting for lo to become free. Usage count = 1
... on 2 VMs and on my baremetal laptop, all running Ubuntu 16.04 and the latests kernels (4.4.0-3[456]).

The result is everything hangs and requires a hard reboot.
Haven't experienced this before last week and I think one of the VM was on 1.11.3 while the others were all on 1.12.0.

@RRAlex This is not specific to any docker version.
If you are using --userland-proxy=false on the daemon options... OR (from what I understand) you are using kubernetes you will likely hit this issue.

The reason being is the --userland-proxy=false option enables hairpin NAT on the bridge interface... this is something that kubernetes also sets when it sets up the networking for it's containers.

Seeing this on a BYO node using Docker Cloud (and Docker Cloud agent).

Saw this today, once (out of about 25 tries) on current Amazon ECS AMIs, running vanilla debian:jessie with a command that apt-get updates, installs pbzip2, then runs it (simple multithreaded CPU test).

@edevil
Most of you people here describe that you encounter this situation while using Docker for starting/stopping containers, but I got exactly the same situation without Docker, on Debian:

  • Debian "Jessie" (=Debian version 8.5), on baremetal (no VM, no cloud, but plain hardware)
  • kernel 3.16.0-4-amd64
  • have 4 LXC containers started
  • shut one LXC container with "lxc-stop -n $containerName"
  • when this command completes, the kernel or the interfaces probably are not entirely 'cleaned up' yet, because when I immediately after the previous "lxc-stop" launch a new "lxc-stop", then I have the kernel problem

No way to recover except a hard reset of the machine.

So please, in your investigations to pinpoint / solve this issue, do not focus on Docker alone. It is obvious a generic issue with fast stop/starts of containers, be it through Docker, or through plain "lxc" commands.

I'd think that this is a problem of linux kernel.

I met this problem when I have 3 chroot(in fact pbuilder) running with very heavy load.
My hardware is Loongson 3A (a mips64el machine with 3.16 kernel).

When I am trying to ssh into it, I met this problem.

So this problem may not only about docker or lxc, it is even about chroot.

Docker version 1.11.2.

kernel:[3406028.998789] unregister_netdevice: waiting for lo to become free. Usage count = 1

cat /etc/os-release 
NAME=openSUSE
VERSION="Tumbleweed"
VERSION_ID="20160417"
PRETTY_NAME="openSUSE Tumbleweed (20160417) (x86_64)"
ID=opensuse
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:opensuse:20160417"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
ID_LIKE="suse"
uname -a
Linux centre 4.5.0-3-default #1 SMP PREEMPT Mon Mar 28 07:27:57 UTC 2016 (8cf0ce6) x86_64 x86_64 x86_64 GNU/Linux

Bare metal.

We had the issue lately on bare metal (dedicated on ovh) with kernel 4.6.x and docker 1.11.2.
After reading comments here and trying multiple workarounds, we downgraded our kernel to the latest version of the 3.14 branch (3.14.74) and upgraded docker to 1.12.0 to avoid https://github.com/docker/libnetwork/issues/1189 and everything seems to be alright for now.

I hope this can help.

All, I think you don't need to post messages anymore about Docker or chroot, it's all about the Linux kernel.
So please, can someone stand up who can debug in some way the kernel , in the parts where it is disabling virtual network interfaces for containers ? Maybe there is some race conditions happening when a previous stop of a container did not yet entirely disable/cleanup its virtual interface, before a new stop of a container is requested.

@rdelangh I don't think that issue is necessarily related to the kernel.

On Fedora 24, I can't reproduce the issue with Docker 1.10.3 from the Fedora repos, only with Docker 1.12.1 from the Docker own repos.

Both tests were conducted with kernel 4.6.7-300.fc24.x86_64.

Seeing this issue as well on CoreOS 1068.10.0, Docker 1.10.3, kernel 4.6.3.

kernel: unregister_netdevice: waiting for veth09b49a3 to become free. Usage count = 1

Using Kubernetes 1.3.4 on CoreOS 1068.9.0 stable on EC2, docker 1.10.3 I see this problem.

unregister_netdevice: waiting for veth5ce9806 to become free. Usage count = 1
unregister_netdevice: waiting for veth5ce9806 to become free. Usage count = 1
unregister_netdevice: waiting for veth5ce9806 to become free. Usage count = 1
...
uname -a
Linux <redacted> 4.6.3-coreos #2 SMP Fri Aug 5 04:51:16 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux

Seeing this issue as well on Ubuntu 16.04, Docker 1.12.1, kernel 4.4.0-34-generic
waiting for lo to become free. Usage count = 1

$ time docker ps
CONTAINER ID        IMAGE                                                                       COMMAND                  CREATED             STATUS                             PORTS                                                                          NAMES

...

real    4m40.943s
user    0m0.012s
sys     0m0.004s

For those using Kubernetes <= 1.3.4 you can exploit this issue: https://github.com/kubernetes/kubernetes/issues/30899 to reproduce this problem. I ran a small cluster with 1 Controller (m4.large) and 2 Workers (m4.large) on CoreOS 1068.10.

From there you can create 2 ReplicationControllers, I called them hello and hello1 based on this: http://pastebin.com/mAtPTrXH . Make sure to change the names and labels to be different.

Then, create 2 deployments matching the same names/labels as the above based on this: http://pastebin.com/SAnwLnCw .

_As soon as you create the deployments, you'll get a crazy amount of spam containers_.

If you leave it on for a while (several minutes), you'll see a lot of stuff trying to terminate/create. You can delete the deployments and let things stabilize. You should see a good handful Terminating and ContainerCreating. If you ssh into the nodes, check dmesg and docker ps to see if the above symptoms are apparent.

In my instance it took me about 5 minutes of letting this freak out before seeing the issue. I plan on making the changes that @sercand and @edevil were toying with and see if this works for me in this case.

@edevil After looking at your linked commit, am I correct that you disabled/removed Flannel in your environment altogether in favor of the cbro bridge created by Kubernetes to get past this issue?

I'm seeing on my end that you would not be able to use them in tandem because flannel wants to use docker0 and your internal networking would be working on cbr0 correct?

@alph486 that's correct, I stopped using flannel. I use the bridge and setup the routes for the pod network.

@alph486 flannel doesn't want to use docker0. It's just the default bridge for docker, which you can override with --bridge=cbr0 docker option.
On CoreOS you would have to override the docker systemd unit.

The Kubelet flag --experimental-flannel-overlay can read flannel configuration, and configure the docker bridge cbr0 with the flannel CIDR.

It will also enable promiscuous mode instead of veth-hairpin which seems to be the issue.

Thanks @dadux for the input. If K8s will pick up the cbr0 interface that has already been bootstrapped by the overridden unit, could be in business with that solution; i'll try it.

According to docs, promiscuous-bridge appears to be the default value for --hairpin-mode in kubelet v1.3.4+. I'm still seeing the issue with this, so I'm not entirely sure that's the whole solution.

I've not been able to reproduce the issue again after using the kubenet network plugin (which is set to replace --configure-cbr0). I'm kind of avoiding the flannel-overlay option due to the uncertainty of its future (it seems to be tied to --configure-cbr0).

If your docker daemon uses the docker0 bridge, setting --hairpin-mode=promiscuous-bridge will have no effect as the kubelet will try to configure the un-existing bridge cbr0.

For CoreOS, my workaround to mirror the Kubernetes behaviour but still using flannel :

  • Add a systemd drop-in for docker.service to configure the bridge docker0 interface to promiscuous mode. (Surely there's a more elegant want to do this ?) :
- name: docker.service
  command: start
  drop-ins:
   - name: 30-Set-Promiscuous-Mode.conf
     content: |
       [Service]
       ExecStartPost=/usr/bin/sleep 5
       ExecStartPost=/usr/bin/ip link set docker0 promisc on
  • Tell the kubelet not to set hairpin in the docker bridge:
    kubelet --hairpin-mode=none

You can check if hairpin is enabled for your interfaces with
brctl showstp docker0
or
for f in /sys/devices/virtual/net/*/brport/hairpin_mode; do cat $f; done

I think my colleague have fixed this recently http://www.spinics.net/lists/netdev/msg393441.html, we encountered this problem in our environment and then we found the issue, with this fix, we never encounter this problem any more. Anyone who encountered this problem, could your try this patch and see if it happen again. And from our analysis, it related to ipv6, so you also can try disable ipv6 of docker with --ipv6=false when starting docker daemon

@coolljt0725 Maybe I'm wrong, but ipv6 is disabled by default in docker and I've just reproduced the problem via docker-stress with "--ipv6=false" option (which is the default anyway). Haven't tried your patch yet.

@dadux Thank you for your help. On Kubernetes 1.3.4 CoreOS 1068 Stable, Docker 10.3, Flannel as networking layer, I have fixed the problem by making the following changes in my CoreOS units:

    - name: docker.service
      drop-ins:
        - name: 30-Set-Promiscuous-Mode.conf
          content: |
            [Service]
            ExecStartPost=/usr/bin/sleep 5
            ExecStartPost=/usr/bin/ip link set docker0 promisc on

Added the following to kubelet.service:

        --hairpin-mode=none

What effect do these changes on Docker/Kubernetes have with regards to how the O/S handles interfaces for containers ?
I must stress that it is an issue with wrong O/S behaviour, not Docker or Kubernetes, because we (and some other people in this thread) are not at all running Docker or Kubernetes, but still encounter exactly the same situations when stopping LXC containers quite quickly one after the other.

@rdelangh You are correct. However, this issue was created in the Docker project to track the behavior as it pertains to Docker. There are other issues mentioned in this thread tracking it as an OS problem, a K8s problem, and CoreOS problem. If you have found the issue in LXC or something else, highly recommend you start a thread there and link here to raise awareness around the issue.

When those using Docker google for this error they will likely land here. So, it makes sense that we post workarounds to this issue here so that until the underlying problems are fixed, people can move forward.

What effect do these changes on Docker/Kubernetes have with regards to how the O/S handles interfaces for containers ?

  1. The docker change in my post allows the Kubernetes stack interrogate docker and make sure the platform is healthy when the issue occurs.
  2. the hairpin-mode change essentially tells K8s to use the docker0 bridge as is, and therefore will not try to use "kernel land" networking and "hairpin veth" which is where the problem begins in the Docker execution path.

Its a workaround for this issue using K8s and Docker.

coolljt0725 colleagues's patch has been queued for stable, so hopefully it'll be backported into distros soon enough. (David Miller's post: http://www.spinics.net/lists/netdev/msg393688.html )

Not sure where that commit is though and if we should send it to Ubuntu, RH, etc. to help them track & backport it?

Going to show up here at some point I guess:
http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/net/ipv6/addrconf.c

EDIT: seems to be present here: https://github.com/torvalds/linux/blob/master/net/ipv6/addrconf.c

Thank you to coolljt0725 and co (and everybody in this thread). Since many people will be unable to update to a kernel with the ipv6 patch for some time, (everyone, currently) I've managed to squash this bug after trying many of the suggestions from this thread. I want to make a full post to follow up on things that did work and did not work so that nobody else has to see the trouble I seen.

TL;DR disable ipv6 in linux boot params, reboot. on coreos this means /usr/share/oem/grub.cfg has the contents: set linux_append="ipv6.disable=1" and then a reboot. a more general purpose suggestion that should work on centos/ubuntu/debian/$linuxes may be found here

  • tried ipvlan(l2,l3)/macvlan(bridge): neither of these work on aws, or at least, I do not possess nor could I find the knowledge to finagle either of them to work on aws. by work, I mean, starting a container attached on a network with ipvlan or macvlan, was unable to ping the gateway / connect to internet (yes, I tested the basic idea working on non-aws environment). This did in fact seem to resolve the issue at hand, but for our use case we need to be able to connect to the internet -- for use cases that do not, this may be a viable option and looks pretty sexy vs. the bridge.
  • tried the following flags passed to dockerd individually, and with certain combinations (since none of them seemed to work, I wasn't too scientific about trying any and all combinations):
--ipv6=false
—iptables=false
—ip-forward=false
—icc=false
—ip-masq=false
—userland-proxy=false

interestingly, --ipv6=false doesn't really seem to do anything -- this was quite perplexing, containers still received inet6 addresses with this flag.

--userland-proxy=false sets hairpin mode and wasn't expected to work really. in conjunction with this I had some hope but this did not resolve the issue, either (setting docker0 to promisc mode). There is a mention of a fix to --userland-proxy=false here and this may be upstream soon and worth another shot, it would be nice to turn this off regardless of the bug noted in this issue for performance but unfortunately it has yet another bug at this time.

  • tried disabling ipv6 via sysctl settings as documented here and restarting systemd-networkd after applying sysctl settings, as well as attempting to disable ipv6 from dhcpcd as documented here and this was not ample to disable ipv6 as it's still turned on, even if no interfaces are using it.
  • as was suggested here we gave this solution a try, only removing one container at a time, and we were still met with this bug.

too long; did read: disable ipv6 in your grub settings. reboot. profit.

Faced this issue on CentOS 7.2 (3.10.0-327.28.3.el7.x86_64) and Docker 1.12.1 (w/o k8s). The problem arises when network traffic increases.
Booting kernel with ipv6 disabled (as per previous advice) didn't help.
But turning the docker0 interface into promisc mode have fixed this. Used systemd drop-in by @dadux (thank you!) - seems to be working good now.

@rdallman Deactivating ipv6 via grub does not prevent unregister_netdevice for me in either ubuntu 16.06 (kernel 4.4.0-36-generic) or 14.04 (kernel 3.13.0-95-generic). Regardless of the --userland-proxy setting (either true or false).

Ooooh, that's cool that patch was queued for stable.
ping @aboch for problem that --ipv6=false does nothing.

@trifle sorry :( thanks for posting info, we are yet to run into issues after few days testing but will update back if we run into any issues. we're running coreos 1122.2 (kernel 4.7.0). setting docker0 to promisc mode seems to fix this for some people (no luck for us).

@RRAlex Do you know if anyone has reached out to the Ubuntu kernel team regarding a backport? We have a large production Docker deployment on an Ubuntu cluster that's affected by the bug.

Ubuntu kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2016-September/thread.html

Patch for the stable kernel:
https://github.com/torvalds/linux/commit/751eb6b6042a596b0080967c1a529a9fe98dac1d

Ubuntu kernel commit log:
http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/log/?h=master-next
(Patch is not there yet)

@leonsp I tried contacting them on what seems to be the related issue:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1403152

If you look at the last (#79) reply, someone built a kernel for Xenial with that patch:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1403152

Not sure when it is going in the main Ubuntu kernel tree though nor what is this person's relation to Ubuntu and if that'll help...

I also can't find the mentioned commits from that thread in the Ubuntu kernel commit log.

@RRAlex The mentioned commits is on ddstreet's branch ~ddstreet/+git/linux:lp1403152-xenial, here is the log: https://code.launchpad.net/~ddstreet/+git/linux/+ref/lp1403152-xenial
So, anyone with this issue on Ubuntu 16.04 can give it a try. https://launchpad.net/~ddstreet/+archive/ubuntu/lp1403152

Possibly @sforshee knows (for the Ubuntu kernel)

I've finally managed to test the "ipv6.disable=1" solution. In addition to that - I've upgraded to 4.7.2 kernel on my debian 8.
After kernel upgrade and enabling "ipv6.disable=1" in kernel parameters I've managed to catch "waiting for lo" issue on real workload even without "--userland-proxy=false" flag for docker daemon. The good news are that after specifying "--userland-proxy=false" and trying to reproduce issue with "docker-stress" - I can no longer do that. But I am pretty sure it will arise again regardless of "--userland-proxy" value.
So from what I see - the ipv6 is definitely involved into this issue, because now docker-stress is no longer able to catch the issue. The bad news is that the issue is actually still there (ie: it's fixed only partially).

Will compile latest 4.8rc7 later to test more.

@twang2218 @coolljt0725

Hmmm.. so I just tried the Ubuntu xenial 4.4.0-36 kernel with the patch backported from ddstreet's ppa:

$ uname -a
Linux paul-laptop 4.4.0-36-generic #55hf1403152v20160916b1-Ubuntu SMP Fri Sep 16 19:13:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Unfortunately, this does not seem to solve the problem for me. Note that I'm also running with "ipv6.disable=1". Are we looking at multiple unrelated causes with the same outcome? Many of the comments in this thread seem to suggest so.

I don't know too much about these, but I know we've had bugs like this before. As I understand it, reference counts to any network device end up getting transferred to lo when a network namespace is being cleaned up, so "waiting for lo to become free" means there's a reference count leak for some net device but not necessarily for lo directly. That makes these a bear to track down, because by the time you know there was a leak you don't know what device it was associated with.

I haven't read back through all the comments, but if someone can give me a reliable reproducer on Ubuntu I'll take a look at it and see if I can figure anything out.

@sforshee it's not always easy to reproduce, but there was a patch created (that at least fixes some of the cases reported here); http://www.spinics.net/lists/netdev/msg393441.html. That was accepted upstream https://github.com/torvalds/linux/commit/751eb6b6042a596b0080967c1a529a9fe98dac1d

@thaJeztah ah, I see the question you were directing me at now.

So the patch is in the upstream 4.4 stable queue, for 16.04 it's likely to be included in not the next kernel SRU (which is already in progress) but the one after that, about 5-6 weeks from now. If it is needed in 14.04 too please let me know so that it can be backported.

@sforshee basically earlier (before that patch) that could be reproduced by enabling ipv6 in kernel (usually enabled by default), adding "--userland-proxy=false" to docker daemon flags and then running docker-stress -c 100, for example (docker-stress is from here: https://github.com/crosbymichael/docker-stress)

@fxposter thanks. If there's a fix for that one though all I really need to worry about is getting that fix into the Ubuntu kernel. I can also help look into other leaks that aren't fixed by that patch.

I'm having this issue too. I'm running docker inside a rancherOS box from AWS. Actually, it happens randomly after setting up a rancher cluster (3 hosts) and running a small application in it.

same... Fedora 24, happen randomly, can be fine for week, than i get one every 10 hours
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Experiencing on CentOS 7 running kernel 3.10.0-327.36.1.el7 and docker 1.12.1

Downgrading to kernel 3.10.0-327.18.2.el7 while remaining on docker 1.12.1, seems to have stabilized the system.

I'm also seeing this:
Docker version 1.11.2
Ubuntu 16.04.1 4.4.0-38-generic

ipv6 disabled (grub)

Just had this problem without --userland-proxy=false (sic!) on server with kernel 4.8.0-rc7, which includes ipv6 patch (sic!!). So maybe it fixes some of the problems, but not all of them, definitely.

Does anyone know how this can be debugged at all?

We discovered that this only occurs on our setup when we (almost) run out of free memory.

@fxposter It would be useful to find minimal reproduction case, which is kinda hard :/ Then we could use ftrace to at least find code paths.

Happening on CoreOS 1081.5.0 (4.6.3-coreos)

Linux blade08 4.6.3-coreos #2 SMP Sat Jul 16 22:51:51 UTC 2016 x86_64 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux

@LK4D4 unfortunately it's no longer possible to reproduce it via docker-stress (at least I could not). I will try to mimic our previous setup with webkits (which triggered this problem quite more often than I would like).

@fxposter That patch does only fix some of the problems(but in our environment, we never encounter it anymore with that patch ), not all, I'll let my colleague keep on looking into this issue. If you have any way to reproduce this, please let me know, thanks :)

I posted a request for Redhat to apply this patch to Fedora 24.

https://bugzilla.redhat.com/show_bug.cgi?id=1379767#c1

4.4.0-42 is still broken for sure...
I mentioned it here for Ubuntu, but maybe someone has a better idea:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1403152

I'm also seeing this, Docker version 1.11.2, build b9f10c9/1.11.2, 64bit Amazon Linux 2016.03 v2.1.6.

still happened. docker 1.12.2, armbian linux kernel 4.8.4, ipv6.disable=1 in bootargs

how to fix the bug, I meet it every day

@woshihaoren Don't use --userland-proxy=false

To clarify - we faced it with userland-proxy disabled too

Getting this on Amazon Linux AMI 2016.9:

$ uname -a

Linux 4.4.23-31.54.amzn1.x86_64 #1 SMP

Docker version:

``` Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.3
Git commit: b9f10c9/1.11.2
Built:
OS/Arch: linux/amd64

Server:
Version: 1.11.2
API version: 1.23
Go version: go1.5.3
Git commit: b9f10c9/1.11.2
Built:
OS/Arch: linux/amd64
```

centos7 kernel 4.4.30 again~~~~

CoreOS 1185.3.0, 4.7.3-coreos-r2, Docker 1.11.2
Reproducible with just running 10..20 debian:jessie containers with "apt-get update" as command.

CoreOS stable is currently still hit. The fix for the 4.7 series is in 4.7.5: https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.7.5

commit 4e1b3aa898ea93ec10e48c06f0e511de37c35b2d
Author: Wei Yongjun <[email protected]>
Date:   Mon Sep 5 16:06:31 2016 +0800

    ipv6: addrconf: fix dev refcont leak when DAD failed

TL;DR - There are no solutions in this post, but I do list what I've chased so far and my current working theories. I'm hoping other folks who are also chasing this might find some info here helpful as we run this thing down.

@koendc Thanks for posting the patch that was introduced into 4.7.5. I back ported the 4e1b3aa898ea93ec10e48c06f0e511de37c35b2d (upstream 751eb6b6042a596b0080967c1a529a9fe98dac1d) patch to my 4.5.5 setup [1] and was able to easily reproduce the unregister_netdevice problem. It is possible that other changes in the 4.7.x kernel work together with the provided patch to resolve this issue, but I have not yet confirmed that, so we shouldn't lose all hope yet. I'm testing with 4.5.5 because I have a reproducible test case to cause the problem, discussed in [2].

Other things I've confirmed based on testing:

  • 4.2 is much more stable than any later kernel
  • 4.5.x is trivially reproducible. Of the newer kernels I've extensively tested (4.8.2 and 4.8.6), the problem still exist, though time to first occurrence ranged anywhere from 60s to 48 hours
  • The problem seems to correlate to both network traffic and the ratio of containers to parent resource (virt cpu) capacity. As others have eluded, this could be a red herring if this is indeed a race condition

Next steps:

  • Instrument a kernel with the appropriate printk statements to try and find a case where allocated resources aren't freed
  • Test the 4.7.5 or later kernel with/without the aforementioned patch to see if the problem occurs
  • Just before one of the crashes, I saw a very interesting set of IPv6: eth0: IPv6 duplicate address <blah> detected errors. Might be another red herring, but I want to try exercising ipv6 disabling to see if there is a correlation

[1] My full setup is a GCE virt running a slightly customized debian kernel based on 4.5.5. Docker version 1.8.3, build f4bf5c7 is running on top of that
[2] Test case information: I have 20 parallel processes, each start a Node.js hello world server inside of a docker container. Instead of returning hello world, the Node.js server returns 1 MB of random text. The parent process is in a tight loop that starts the container, curls to retrieve the 1MB of data, and stops the container. Using this setup, I can consistently reproduce the problem in 4-90s. Using this same setup on a physical host or inside of virtualbox does not reproduce the problem, despite varying items that alter mean time to reproduction on the GCE box. Variables I've been playing with: number of concurrent test processes, size of payload transferred, and quantity of curl calls. The first two variables are definitely correlated, though I think it's likely just adjusting the variables in order to find a reasonable saturation point for the virt.

I too am having this error.

I see it repeated 3 times after deploying a container.

Description

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Steps to reproduce the issue:

  1. ssh into docker host
  2. run a container
docker run -d --network=anetwork --name aname -p 9999:80 aimagename

Describe the results you received:

Just get the error repeated 3 times.

Describe the results you expected:
No error

Additional information you deem important (e.g. issue happens only occasionally):
Just started happening after this weekend.

Output of docker version:

 docker --version
Docker version 1.12.3, build 6b644ec

Output of docker info:

docker info
Containers: 10
 Running: 9
 Paused: 0
 Stopped: 1
Images: 16
Server Version: 1.12.3
Storage Driver: overlay2
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.8.4-200.fc24.x86_64
Operating System: Fedora 24 (Server Edition)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 15.67 GiB
Name: docker-overlayfs
ID: AHY3:COIU:QQDG:KZ7S:AUBY:SJO7:AHNB:3JLM:A7RN:57CQ:G56Y:YEVU
Docker Root Dir: /docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

Virtual machine:
Fedora 24
OverlayFS2 on ext3

Separate drive allocated for docker use 24 gigs.
16 gigs of ram.

Docker PS

docker ps -a
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                      PORTS                                            NAMES
5664a10de50b        7f01d324a3cb         "/bin/sh -c 'apk --no"   11 minutes ago      Exited (1) 10 minutes ago                                                    pensive_brattain
3727b3e57e2f        paa-api              "/bin/sh -c /run.sh"     10 days ago         Up 10 days                  0.0.0.0:8080->80/tcp                             paa-api
43cfe7eae9cf        paa-ui               "nginx -g 'daemon off"   10 days ago         Up 10 days                  0.0.0.0:80->80/tcp, 443/tcp                      paa-ui
345eaab3b289        sentry               "/entrypoint.sh run w"   11 days ago         Up 11 days                  0.0.0.0:8282->9000/tcp                           my-sentry
32e555609cd2        sentry               "/entrypoint.sh run w"   11 days ago         Up 11 days                  9000/tcp                                         sentry-worker-1
a411d09d7f98        sentry               "/entrypoint.sh run c"   11 days ago         Up 11 days                  9000/tcp                                         sentry-cron
7ea48b27eb85        postgres             "/docker-entrypoint.s"   11 days ago         Up 11 days                  5432/tcp                                         sentry-postgres
116ad8850bb1        redis                "docker-entrypoint.sh"   11 days ago         Up 11 days                  6379/tcp                                         sentry-redis
35ee0c906a03        uifd/ui-for-docker   "/ui-for-docker"         11 days ago         Up 11 days                  0.0.0.0:9000->9000/tcp                           docker-ui
111ad12b877f        elasticsearch        "/docker-entrypoint.s"   11 days ago         Up 11 days                  0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   paa-elastic

Docker images

 docker images -a
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
<none>               <none>              7f01d324a3cb        12 minutes ago      88.51 MB
<none>               <none>              1a6a12354032        12 minutes ago      88.51 MB
debian               jessie              73e72bf822ca        6 days ago          123 MB
paa-api              latest              6da68e510175        10 days ago         116.9 MB
<none>               <none>              4c56476ba36d        10 days ago         116.9 MB
<none>               <none>              3ea3bff63c7b        10 days ago         116.8 MB
<none>               <none>              05d6d5078f8a        10 days ago         88.51 MB
<none>               <none>              30f0e6001f1e        10 days ago         88.51 MB
paa-ui               latest              af8ff5acc85a        10 days ago         188.1 MB
elasticsearch        latest              5a62a28797b3        12 days ago         350.1 MB
sentry               latest              9ebeda6520cd        13 days ago         493.7 MB
redis                latest              74b99a81add5        13 days ago         182.9 MB
python               alpine              8dd7712cca84        13 days ago         88.51 MB
postgres             latest              0267f82ab721        13 days ago         264.8 MB
nginx                latest              e43d811ce2f4        3 weeks ago         181.5 MB
uifd/ui-for-docker   latest              965940f98fa5        9 weeks ago         8.096 MB

Docker Volume Ls

DRIVER              VOLUME NAME
local               3bc848cdd4325c7422284f6898a7d10edf8f0554d6ba8244c75e876ced567261
local               6575dad920ec453ca61bd5052cae1b7e80197475b14955115ba69e8c1752cf18
local               bf73a21a2f42ea47ce472e55ab474041d4aeaa7bdb564049858d31b538bad47b
local               c1bf0761e8d819075e8e2427c29fec657c9ce26bc9c849548e10d64eec69e76d
local               e056bce5ae34f4066d05870365dcf22e84cbde8d5bd49217e3476439d909fe44

* DF -H*

df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 7.9G     0  7.9G   0% /dev
tmpfs                    7.9G     0  7.9G   0% /dev/shm
tmpfs                    7.9G  1.3M  7.9G   1% /run
tmpfs                    7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/mapper/fedora-root   11G  1.6G  8.7G  16% /
tmpfs                    7.9G  8.0K  7.9G   1% /tmp
/dev/sda1                477M  130M  319M  29% /boot
/dev/sdb1                 24G  1.6G   21G   7% /docker
overlay                   24G  1.6G   21G   7% /docker/overlay2/5591cfec27842815f5278112edb3197e9d7d5ab508a97c3070fb1a149d28f9f0/merged
shm                       64M     0   64M   0% /docker/containers/35ee0c906a03422e1b015c967548582eb5ca3195b3ffdd040bb80df9bb77cd32/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/73e795866566e845f09042d9f7e491e8c3ac59ebd7f5bc0ee4715d0f08a12b7b/merged
shm                       64M  4.0K   64M   1% /docker/containers/7ea48b27eb854e769886f3b662c2031cf74f3c6f77320a570d2bfa237aef9d2b/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/fad7f3b483bc48b83c3a729368124aaaf5fdd7751fe0a383171b8966959ac966/merged
shm                       64M     0   64M   0% /docker/containers/116ad8850bb1c74d1a33b6416e1b99775ef40aa13fc098790b7e4ea07e3e6075/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/456c40bc86852c9f9c9ac737741b57d30f2167882f15b32ac25f42048648d945/merged
shm                       64M     0   64M   0% /docker/containers/a411d09d7f98e1456a454a399fb68472f5129df6c3bd0b73f59236e6f1e55e74/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/3ee2b1b978b048f4d80302eec129e7163a025c7bb8e832a29567b64f5d15baa0/merged
shm                       64M     0   64M   0% /docker/containers/32e555609cd2c77a1a8efc45298d55224f15988197ef47411a90904cf3e13910/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/3e1cdabc2ae422a84b1d4106af1dde0cd670392bbe8a9d8f338909a926026b73/merged
shm                       64M     0   64M   0% /docker/containers/345eaab3b289794154af864e1d14b774cb8b8beac8864761ac84051416c7761b/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/6bfc33084abe688af9c1a704a0daba496bee7746052103ef975c76d2c74d6455/merged
shm                       64M     0   64M   0% /docker/containers/111ad12b877f4d4d8b3ab4b44b06f645acf89b983580e93d441305dcc7926671/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/0b454336447a39d06966adedf4dc4abed6405212107a2f8f326072ae5fb58b3d/merged
shm                       64M     0   64M   0% /docker/containers/43cfe7eae9cf310d64c6fe0f133152067d88f8d9242e48289148daebd9cb713d/shm
overlay                   24G  1.6G   21G   7% /docker/overlay2/0d8bba910f1f5e928a8c1e5d02cc55b6fe7bd7cd5c4d23d4abc6f361ff5043ac/merged
shm                       64M     0   64M   0% /docker/containers/3727b3e57e2f5c3b7879f

DF -i

 df -i
Filesystem               Inodes IUsed   IFree IUse% Mounted on
devtmpfs                2051100   411 2050689    1% /dev
tmpfs                   2054171     1 2054170    1% /dev/shm
tmpfs                   2054171   735 2053436    1% /run
tmpfs                   2054171    16 2054155    1% /sys/fs/cgroup
/dev/mapper/fedora-root 5402624 53183 5349441    1% /
tmpfs                   2054171     8 2054163    1% /tmp
/dev/sda1                128016   350  127666    1% /boot
/dev/sdb1               1572864 72477 1500387    5% /docker
overlay                 1572864 72477 1500387    5% /docker/overlay2/5591cfec27842815f5278112edb3197e9d7d5ab508a97c3070fb1a149d28f9f0/merged
shm                     2054171     1 2054170    1% /docker/containers/35ee0c906a03422e1b015c967548582eb5ca3195b3ffdd040bb80df9bb77cd32/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/73e795866566e845f09042d9f7e491e8c3ac59ebd7f5bc0ee4715d0f08a12b7b/merged
shm                     2054171     2 2054169    1% /docker/containers/7ea48b27eb854e769886f3b662c2031cf74f3c6f77320a570d2bfa237aef9d2b/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/fad7f3b483bc48b83c3a729368124aaaf5fdd7751fe0a383171b8966959ac966/merged
shm                     2054171     1 2054170    1% /docker/containers/116ad8850bb1c74d1a33b6416e1b99775ef40aa13fc098790b7e4ea07e3e6075/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/456c40bc86852c9f9c9ac737741b57d30f2167882f15b32ac25f42048648d945/merged
shm                     2054171     1 2054170    1% /docker/containers/a411d09d7f98e1456a454a399fb68472f5129df6c3bd0b73f59236e6f1e55e74/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/3ee2b1b978b048f4d80302eec129e7163a025c7bb8e832a29567b64f5d15baa0/merged
shm                     2054171     1 2054170    1% /docker/containers/32e555609cd2c77a1a8efc45298d55224f15988197ef47411a90904cf3e13910/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/3e1cdabc2ae422a84b1d4106af1dde0cd670392bbe8a9d8f338909a926026b73/merged
shm                     2054171     1 2054170    1% /docker/containers/345eaab3b289794154af864e1d14b774cb8b8beac8864761ac84051416c7761b/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/6bfc33084abe688af9c1a704a0daba496bee7746052103ef975c76d2c74d6455/merged
shm                     2054171     1 2054170    1% /docker/containers/111ad12b877f4d4d8b3ab4b44b06f645acf89b983580e93d441305dcc7926671/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/0b454336447a39d06966adedf4dc4abed6405212107a2f8f326072ae5fb58b3d/merged
shm                     2054171     1 2054170    1% /docker/containers/43cfe7eae9cf310d64c6fe0f133152067d88f8d9242e48289148daebd9cb713d/shm
overlay                 1572864 72477 1500387    5% /docker/overlay2/0d8bba910f1f5e928a8c1e5d02cc55b6fe7bd7cd5c4d23d4abc6f361ff5043ac/merged
shm                     2054171     1 2054170    1% /docker/containers/3727b3e57e2f5c3b7879f23deb3b023d10c0b766fe83e21dd389c71021af371f/shm
tmpfs                   2054171     5 2054166    1% /run/user/0

Free -lmh

free -lmh
              total        used        free      shared  buff/cache   available
Mem:            15G        3.0G         10G         19M        2.7G         12G
Low:            15G        5.6G         10G
High:            0B          0B          0B
Swap:          1.2G          0B        1.2G

For any of those interested, we (Travis CI) are rolling out an upgrade to v4.8.7 on Ubuntu 14.04. Our load tests showed no occurrences of the error described here. Previously, we were running linux-image-generic-lts-xenial on Ubuntu 14.04. I'm planning to get a blog post published in the near future describing more of the details.


UPDATE: I should have mentioned that we are running this docker stack:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:44:32 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:44:32 2016
 OS/Arch:      linux/amd64

UPDATE: We are _still_ seeing this error in production on Ubuntu Trusty + kernel v4.8.7. We don't yet know why these errors disappeared in staging load tests that previously reproduced the error, yet the error rate in production is effectively the same. Onward and upward. We have disabled "automatic implosion" based on this error given the high rate of instance turnover.

also found in centos 7

Message from syslogd@c31392666b98e49f6ace8ed65be337210-node1 at Nov 17 17:28:07 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@c31392666b98e49f6ace8ed65be337210-node1 at Nov 17 17:32:47 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@c31392666b98e49f6ace8ed65be337210-node1 at Nov 17 17:37:32 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@c31392666b98e49f6ace8ed65be337210-node1 at Nov 17 17:37:42 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

[root@c31392666b98e49f6ace8ed65be337210-node1 ~]# docker info
Containers: 19
 Running: 15
 Paused: 0
 Stopped: 4
Images: 23
Server Version: 1.11.2.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local nas acd ossfs
 Network: vpc bridge null host
Kernel Version: 4.4.6-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.795 GiB
Name: c31392666b98e49f6ace8ed65be337210-node1
ID: WUWS:FDP5:TNR6:EE5B:I2KI:O4IT:TQWF:4U42:5327:7I5K:ATGT:73KM
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Cluster store: etcd://test.com:2379
Cluster advertise: 192.168.0.2:2376

Same thing happening here with a DigitalOcean VPS on Debian testing:


# journalctl -p0 | tail -15

Nov 19 12:02:55 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 12:03:05 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 12:17:44 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 12:48:15 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 13:33:08 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 14:03:04 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 14:03:14 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 14:17:59 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 15:03:02 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 15:18:13 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 15:32:44 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 16:03:13 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 16:47:43 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 17:17:46 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Nov 19 17:17:56 hostname kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1


System

$ apt list --installed 'linux-image*'
Listing... Done
linux-image-3.16.0-4-amd64/now 3.16.36-1+deb8u2 amd64 [installed,local]
linux-image-4.8.0-1-amd64/testing,now 4.8.5-1 amd64 [installed,automatic]
linux-image-amd64/testing,now 4.8+76 amd64 [installed]

$ apt list --installed 'docker*'
Listing... Done
docker-engine/debian-stretch,now 1.12.3-0~stretch amd64 [installed]
N: There are 22 additional versions. Please use the '-a' switch to see them.

$ uname -a
Linux hostname 4.8.0-1-amd64 #1 SMP Debian 4.8.5-1 (2016-10-28) x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux testing (stretch)
Release:    testing
Codename:   stretch


$ docker info

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 42
Server Version: 1.12.3
Storage Driver: devicemapper
 Pool Name: docker-254:1-132765-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 435 MB
 Data Space Total: 107.4 GB
 Data Space Available: 16.96 GB
 Metadata Space Used: 1.356 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.136 (2016-11-05)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.8.0-1-amd64
Operating System: Debian GNU/Linux stretch/sid
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 996.4 MiB
Name: hostname
ID: <redacted>
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8


$ docker ps -a

CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS                              NAMES
0b54ed86ba70        squid/production    "/usr/sbin/squid -N"   29 hours ago        Up 29 hours         0.0.0.0:8080-8081->8080-8081/tcp   squid


$ ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether de:ad:be:ff:ff:ff brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:ad:be:ff:ff:ff brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether de:ad:be:ff:ff:ff brd ff:ff:ff:ff:ff:ff
234: veth64d2a77@if233: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether de:ad:be:ff:ff:ff brd ff:ff:ff:ff:ff:ff link-netnsid 1


# ifconfig

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 dead::beef:dead:beef:ffff  prefixlen 64  scopeid 0x20<link>
        ether de:ad:be:ef:ff:ff  txqueuelen 0  (Ethernet)
        RX packets 3095526  bytes 1811946213 (1.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2642391  bytes 1886180372 (1.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 123.45.67.89  netmask 255.255.240.0  broadcast 123.45.67.89
        inet6 dead::beef:dead:beef:ffff  prefixlen 64  scopeid 0x0<global>
        inet6 dead::beef:dead:beef:ffff  prefixlen 64  scopeid 0x20<link>
        ether dead::beef:dead:beef:ffff  txqueuelen 1000  (Ethernet)
        RX packets 3014258  bytes 2087556505 (1.9 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3453430  bytes 1992544469 (1.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 178  bytes 15081 (14.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 178  bytes 15081 (14.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth64d2a77: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 dead::beef:dead:beef:ffff  prefixlen 64  scopeid 0x20<link>
        ether d2:00:ac:07:c8:45  txqueuelen 0  (Ethernet)
        RX packets 1259405  bytes 818486790 (780.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1103375  bytes 817423202 (779.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I've been testing 4.8.8 in a tight loop (see [2] from my earlier comment for the test case) non-stop for the last 4 days. So far, so good.

Facts

  • Patch 751eb6b6 significantly reduces the occurrence of this problem, but does not entirely eliminate it.

Suppositions
@meatballhat pointed out that their production servers experienced the problem while running 4.8.7. This leaves us with two possibilities:

  • My test test case is faulty (the more likely possibility)
  • 4.8.8 introduced a fix. Looking at the 4.8.8 changelog, commits 5086cadf and 6fff1319 both mention netdev issues that were resolved. Neither one explicitly calls out the problem here, but both are close enough to be suspicious.

Can we get a few folks to try 4.8.8 to see if they are able to reproduce this problem?

@reshen I'll get us updated to 4.8.8 and report back :+1: Thanks much for your research!

@reshen Excellent research. So far I've also not been able to reproduce the problem using Linux 4.8.8 on Xubuntu 16.04.

I've been using the Ubuntu mainline kernel builds. I do not have a well defined test case, but I could consistently reproduce the problem before by starting and stopping the set of docker containers I work with.

To test Linux 4.8.8 the easiest for me was to switch from aufs to overlay2 as storage driver as the mainline kernel builds did not include aufs. I don't think will influence the test, but it should be noted.

In the past I've tested Linux 4.4.4 with the 751eb6b6 backported by Dan Streetman, this did not seem to reduce the problem for me. It will be interesting to see if also backporting the two patches noted by you (5086cadf and 6fff1319) can give the same result as 4.4.8.

Ubuntu 16.04 with 4.4.0-47 was still affected... trying 4.4.0-49 now, will report later.

edit: 2016-11-28: -49 is sitll showing that log line in dmesg.

Experienced this on Fedora 25 (kernel 4.8.8) and Docker 1.12.3

FYI: we've been running Linux 4.8.8 in conjunction with Docker v1.12.3 on a single production host. Uptime is presently at 5.5 days and the machine remains stable.

We occasionally see a handful of unregister_netdevice: waiting for lo to become free. Usage count = 1 messages in syslog, but unlike before, the kernel does not crash and the message goes away. I suspect that one of the other changes introduced either in the Kernel or in Docker detect this condition and now recover from it. For us, this now makes this message annoying but no longer a critical bug.

I'm hoping some other folks can confirm the above on their production fleets.

@gtirloni - can you clarify if your 4.8.8/1.12.3 machine crashed or if you just saw the message?

Thank you, in advance, to everyone who has been working on reproducing/providing useful information to triangulate this thing.

we delete the counterpart of the veth interface (docker0) after starting docker and restart docker afterwards when we provision the host using ansible. Problem didn't occur since.

I'm also getting this same error on a Raspberry Pi 2 running Raspbian with Docker.

Kernel info
Linux rpi2 4.4.32-v7+ #924 SMP Tue Nov 15 18:11:28 GMT 2016 armv7l GNU/Linux

Docker Info

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 9
Server Version: 1.12.3
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.32-v7+
Operating System: Raspbian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 925.5 MiB
Name: rpi2
ID: 24DC:RFX7:D3GZ:YZXF:GYVY:NXX3:MXD3:EMLC:JPLN:7I6I:QVQ7:M3NX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpuset support
Insecure Registries:
 127.0.0.0/8

It happened after creating a container which needed around ~50Mb of downloaded programs installed.

Only a reboot would let me use the machine again

I am actually seeing this on Amazon Linux in an ECS cluster - the message occasionally throws but it doesn't lock up, like reshen's seeing now. Docker 1.11.2. Uname reports "4.4.14-24.50.amzn1.x86_64" as the version.

@reshen I'm going to build 4.8.8 this weekend on my laptop and see if that
fixes it for me!

On Thu, Dec 1, 2016 at 10:29 AM, Ernest Mueller notifications@github.com
wrote:

I am actually seeing this on Amazon Linux in an ECS cluster - the message
occasionally throws but it doesn't lock up, like reshen's seeing now.
Docker 1.11.2. Uname reports "4.4.14-24.50.amzn1.x86_64" as the version.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-264220432, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKklVRqoBUZDu3HMhGv3b6knnA6j6C_Qks5rDvXRgaJpZM4B4L4Z
.

--
Keifer Furzland
http://kfrz.work

I was also able to reproduce this issue using https://github.com/crosbymichael/docker-stress on a Kubernetes worker node running CoreOS Stable 1185.3.0.

Running docker_stress_linux_amd64 -k 3s -c 5 --containers 1000: 5 concurrent workers creating/deleting containers, max lifetime of containers = 3s, create up to 1000 containers on an m4.large instance on AWS would leave the Docker daemon unresponsive after about three minutes.

Upgraded to CoreOS Beta 1235.1.0 and I haven't been able to reproduce (both the unresponsiveness or the unregister_netdevice message in the kernel logs). Whereas running 5 concurrent docker_stress workers would kill CoreOS Stable after a few minutes, I was able to run with 10 and 15 concurrent workers until test completion using CoreOS Beta.

CoreOS releases in "channels" so it's not possible to upgrade the kernel in isolation. Here are the major differences between stable and beta:

CoreOS Stable 1185.3.0

kernel: 4.7.3

docker: 1.11.2

CoreOS Beta 1235.1.0

kernel: 4.8.6

docker: 1.12.3

Seeing this issue on Amazon Elastic Beanstalk running 4.4.23-31.54.amzn1.x86_64

Just Happen on CoreOS Stable 1185.5.0, Docker 1.12.2
After a reboot everything is fine

Update: the hung Docker daemon issue has struck again on a host running CoreOS Beta 1235.1.0 with Docker v1.12.3, and Linux kernel v4.8.6. 😢

1.12.4 and 1.13 should, in theory, not freeze up when this kernel issue is hit.
The reason the freeze in the docker daemon occurs is because the daemon is waiting for a netlink message back from the kernel (which will never come) while holding the lock on the container object.

1.12.4 and 1.13 set a timeout on this netlink request to at least release the container lock.
This does __not__ fix the issue, but at least (hopefully) does not freeze the whole daemon.
You will likely not be able to spin up new containers, and similarly probably will not be able to tear them down since it seems like all interactions with netlink stall once this issue is hit.

@cpuguy83 FWIW, any running containers continue to run without issue AFAIK when the daemon is hung. Indeed, it's the starting and stopping of containers that is noticeable (especially running on Kubernetes, as we are).

This does not fix the issue, but at least (hopefully) does not freeze the whole daemon.

The one upside of the whole daemon being frozen is that it's easy to figure out. Kubernetes can evict the node, maybe even automatically reboot. Should the daemon just keep running, would it still be possible to easily find the kernel issue appeared at all?

@seanknox I could provide you with a custom CoreOS 1248.1.0 AMI with patched Docker (CoreOS Docker 1.12.3 + Upstream 1.12.4-rc1 Patches). It has fixed hangups every couple of hours on my CoreOS/K8s clusters. Just ping me with your AWS Account-ID on the Deis Slack.

We got a huge pain with this issue on our CoreOS cluster. Could anyone tell when it will be finally fixed? We dream about this moment when we can sleep at night.

@DenisIzmaylov If you don't set --userland-proxy=false, then generally you should not run into this issue.

But otherwise this is a bug in the kernel, possibly multiple kernel bugs, that some say is resolved in 4.8 and others say not. For some, disabling ipv6 seems to fix it, others not (hence it's probably multiple issues... or at least multiple causes).

I've seen this issue within hours on high load systems with and without --userland-proxy=false

Confirmed we are still seeing unregister_netdevice errors on kernel 4.8.12. It takes about 5 days to trigger. Only a reboot of the system seems to recover from the issue. Stopping Docker seems to hang indefinitely.

Have not tried the disable ipv6 trick for kernel boot yet.

Containers: 17
 Running: 14
 Paused: 0
 Stopped: 3
Images: 121
Server Version: 1.10.3
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.8.12-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 62.86 GiB
Name: **REDACTED***
ID: **REDACTED***
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Would be awesome if someone can try this with 1.12.5, which should timeout on the stuck netlink request now instead of just hanging Docker.

@cpuguy83 however, system is still unusable in that state :)

@LK4D4 Oh, totally, just want to see those timeouts ;)

Getting this issue on Cent OS 7:

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Linux foo 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

docker-engine-1.12.5-1.el7.centos.x86_64

This is effecting my CI builds which are running inside Docker containers and appear to be suddenly dying during which this console message appears. Is there a fix or a workaround? thanks!

@cpuguy83 Docker doesnt hang for me when this error occurs but the containers get killed which in my situation is breaking my Jenkins/CI jobs.

So i've been running docker on a centos 7 machine for a while (11 months?) without issue. Today i decided to give the tcp listening daemon a try (added the tcp listening address to /etc/sysconfig/docker) and just got this error.

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

so my usage count is not 3.

Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 67
Server Version: 1.10.3
Storage Driver: btrfs
Build Version: Btrfs v4.4.1
Library Version: 101
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 3.10.0-514.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 24
Total Memory: 39.12 GiB
Name: aimes-web-encoder
ID: QK5Q:JCMA:ATGR:ND6W:YOT4:PZ7G:DBV5:PR26:YZQL:INRU:HAUC:CQ6B
Registries: docker.io (secure)

3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Client:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-59.el7.centos.x86_64
Go version: go1.6.3
Git commit: 3999ccb-unsupported
Built: Thu Dec 15 17:24:43 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.3
API version: 1.22
Package version: docker-common-1.10.3-59.el7.centos.x86_64
Go version: go1.6.3
Git commit: 3999ccb-unsupported
Built: Thu Dec 15 17:24:43 2016
OS/Arch: linux/amd64

I can confirm @aamerik. I am seeing the same issue on the same kernel version. No recent major changes on the system, seeing this issue since today.

I saw the same kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 message on my CentOS 7 machine running a docker image of Jenkins. The CentOS 7 machine I was using was current with all the latest CentOS 7 patches as of approximately 20 Dec 2016.

Since the most recent references here seem to be CentOS based, I'll switch my execution host to a Ubuntu or a Debian machine.

I am running Docker version 1.12.5, build 7392c3b on that CentOS 7 machine. Docker did not hang, but the Jenkins process I was running in Docker was killed when that message appeared.

Thanks so much for Docker! I use it all the time, and am deeply grateful for your work on it!

I'm seeing the same issue when using Jenkins with Docker on a Linux 4.8.15 machine.

Did anyone , reach a fix procedure for rancher os ?

AFAICT, this is a locking issue in the network namespaces subsystem of Linux kernel. This bug has been reported over a year ago, with no reply: https://bugzilla.kernel.org/show_bug.cgi?id=97811 There has been some work on this (see here: http://www.spinics.net/lists/netdev/msg351337.html) but it seems it's not a complete fix.

I've tried pinging the network subsystem maintainer directly, with no response. FWIW, I can reproduce the issue in a matter of minutes.

Smyte will pay $5000 USD for the resolution of this issue. Sounds like I need to talk to someone who works on the kernel?

@petehunt I believe there are multiple issues at play causing this error.

We deployed kernel 4.8.8 as @reshen suggested and while uptime seems a bit better we still continue to see this issue in production.

Trying to deploy Mesosphere from a bootstrap node. All nodes are CentOS 7.2 minimal with all updates applied. The bootstrap node is showing the error as noted above by others:

Message from syslogd@command01 at Jan 16 02:30:24 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@command01 at Jan 16 02:30:34 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@command01 at Jan 16 02:30:44 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

uname -r:

3.10.0-514.2.2.el7.x86_64

docker -v:

Docker version 1.11.2, build b9f10c9

I can confirm a reboot silences the messages, but the minute i deploy mesosphere again, the messages start every now and then. Mesosphere is quite a large deployment. Maybe those trying to recreate the error can use the installer to reproduce the error. It does take a few minutes before the error shows up after using the first script switch ( --genconf which is the first step ).

We've hit this also. However, the error messages in our case mention the device eth0 not lo. My error is this:

kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1

I'm assuming that errors like mentioning eth0 instead of lo have the same root cause as this issue. If not, we should open a new ticket regarding the eth0 errors.

  • OS: CentOS Linux release 7.3.1611
  • Kernel: 3.10.0-514.2.2.el7.x86_64
  • Docker version 1.12.6, build 78d1802
  • Docker started with options: OPTIONS=" -H unix:///var/run/docker.sock --ip-forward=true --iptables=true --ip-masq=true --log-driver json-file --log-opt max-size=25m --log-opt max-file=2"
  • Rancher 1.2.2 using IPsec (Mentioning this since https://bugzilla.kernel.org/show_bug.cgi?id=97811 and other bugs mention IPsec)

We've hit this also.
Error: unregister_netdevice: waiting for lo to become free. Usage count = 1
OS: CentOS Linux release 7.3.1611 (Core)
Kernel 3.10.0-514.2.2.el7.x86_64
Docker version: 1.13.0-cs1-rc1
Docker options:
{
"disable-legacy-registry": true,
"icc":true,
"insecure-registries":[],
"ipv6":false,
"iptables":true,
"storage-driver": "devicemapper",
"storage-opts": [
"dm.thinpooldev=/dev/mapper/docker_vg-thinpool",
"dm.use_deferred_removal=true",
"dm.use_deferred_deletion=true"
],
"userland-proxy": false
}

I have this on two CentOS systems, latest updates on at least one of them.

$ uname -r
3.10.0-514.2.2.el7.x86_64
$ docker -v
Docker version 1.12.6, build 78d1802

Hey, for everyone affected by this issue on RHEL or CentOS, I've backported the commit from the mainline kernels (torvalds/linux@751eb6b6042a596b0080967c1a529a9fe98dac1d) that fixes the race condition in the IPV6 IFP refcount to 3.10.x kernels used in enterprise distributions. This should fix this issue.

You can find the bug report with working patch here:
If you are interested in testing it and have a RHEL 7 or CentOS 7 system, I have already compiled the latest CentOS 7.3 3.10.0-514.6.1.el7.x86_64 kernel with the patch. Reply to the CentOS bugtracker thread and I can send you a link to the build.

Note: there may be another issue causing a refcount leak but this should fix the error message for many of you.

@stefanlasiewski @henryiii @jsoler

I'll be trying out a build also adding this fix: http://www.spinics.net/lists/netdev/msg351337.html later tonight.

@iamthebot does it mean that if one disables IPv6 it should fix the issue too, even without a patch you just backported?

@redbaron only if that is the issue that you are hitting. I really think there are multiple kernel issues being hit here.

@redbaron maybe. #20569 seems to indicate fully disabling IPV6 is difficult.

So to clarify a bit what's happening under the hood that's generating this message, the kernel maintains a running count of if a device is in use before removing it from a namespace, unregistering it, deactivating it, etc. If for some reason there's a dangling reference to a device, then you're going to see that error message since it can't be unregistered when something else is using it.

The fixes I've seen so far:

  1. Fix an issue where an issue with an IPV6 address allocation fails (eg; a duplicate address) but we don't release the reference to the device before exiting.
  2. Fix an issue where moving the namespace of a device will correctly move the references to the device to the new network namespace but leave a dangling reference on the old namespace. Docker heavily makes use of network namespaces (as evidenced by another kernel fix that I had Red Hat backport to 7.3 Z-Stream and is slated for inclusion in 7.4 that prevents docker's macvlan driver from working on bond or team devices)

I think there's still another race condition when switching namespaces (this seems to happen after creating a bunch of new containers) but I'll need to replicate the issue reliably in order to hunt it down and write a patch.

Does anyone have a minimal procedure for consistently reproducing this? Seemed to happen randomly on our systems.

@iamthebot it's not really straightforward, but I think we can provide you with a test environment that can reliably reproduce this. Email me ([email protected]) and we can arrange the details.

Still experience this under heavy load on Docker version 1.12.6, build 7392c3b/1.12.6 on 4.4.39-34.54.amzn1.x86_64 AWS Linux AMI.

I have 9 docker hosts all nearly identical, and only experience this on some of them. It may be coincidence, but one thing in common I've noticed is that I only seem to have this problem when running containers that do not handle SIGINT. When I docker stop these containers, it hangs for 10s and then kills the container ungracefully.

It takes several days before the issue presents itself, and seems to show up randomly, not just imediately after running docker stop. This is mostly anecdotal, but maybe it will help someone.

I have upgraded all my docker nodes to kernel 3.10.0-514.6.1.el7.x86_64 on CentOS 7.3 as @iamthebot mentioned but I still get same errors:
Jan 26 13:52:49 XXX kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
Message from syslogd@XXX at Jan 26 13:52:49 ...
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@jsoler just to be clear, did you apply the patch in the bug tracker thread before building the kernel? Or are you using a stock kernel? Also try applying this one (patch should work on older kernels).

Shoot me an email ([email protected]) and I can send you a link to a pre-built kernel. @vitherman I unfortunately don't have a lot of time to look into this (looks like some instrumentation will need to be compiled in to catch this bug) but I've escalated the issue with Red Hat support so their kernel team will take a look.

@ckeeney I can confirm this behavior. We have a dockerized Node application which caused said error on the host system when it was shut down. After implementing a function within the Node.js application, that catches SIGINT and SIGTERM to gracefully shut down the application the error hasn't occured again.
Which kinda makes sense; the Node application uses the virtual interface Docker creates. When Node doesn't get shut down properly, the device hangs and the host system cant unregister it, even though the Docker container has succesfully been stopped.

here is an example code snippet:

function shutdown() {
    logger.log('info', 'Graceful shutdown.');

    httpServer.close();
    if (httpsServer) {
        httpsServer.close();
    }

    process.exit();
}

process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);

@michael-niemand is there a different signal that is properly handled by Node by default for a clean shutdown? (you can specify the STOPSIGNAL in the image, or on docker run through the --stop-signal flag.

@thaJeztah for a good explanation of the problem, and workaround, see nodejs/node-v0.x-archive#9131#issuecomment-72900581

@ckeeney I'm aware of that (i.e., processes running as PID1 may not handle SIGINT or SIGTERM). For that reason, I was wondering if specifying a different stop-signal would do a clean-shutdown even if running as PID1.

Alternatively, docker 1.13 adds an --init option (pull request: https://github.com/docker/docker/pull/26061), that insert an init in the container; in that case, Node is not running as PID1, which may help in cases that you cannot update the application.

@iamthebot I have built kernel version 3.10.0-514.el7 with your patch integrated but I get same error. Well I am not sure if I have built well the kernel package of centos. Could you share me your kernel package to test it ?

Thanks

I have/had been dealing with this bug for almost a year now. I use CoreOS with PXE boot, I disabled ipv6 in the pxeboot config and I haven't seen this issue once since then.

Well my environment has disabled ipv6 with this sysctl configuration
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
but I still get the error
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@jsoler right, I was doing that too, still happened. Only once I did it pxe level did it stop.

label coreos
        menu label CoreOS
        kernel coreos/coreos_production_pxe.vmlinuz
        append initrd=coreos/coreos_production_pxe_image.cpio.gz ipv6.disable=1 cloud-config-url=http://...

Just an observation - there seem to be different problems at play (that has been said before).

  • Some people note "waiting for lo..."
  • some have noted "waiting for eth0"
  • some have noted "waiting for veth?????"
  • On RedHat bug tracking, there is talk about "waiting for ppp0"

Some have noted logs alternating between any of those above and others only having one of the above only.

There is also a similar bug logged on Ubuntu. On this one, they seem to find that NFS is the problem.

@etlweather I believe that in fact the only common denominator is, well, a net device not being able to be unregistered by the kernel as the error message says. However the reasons _why_ are somewhat different. For us it definitely was the mentioned docker / node issue (veth). For eth, lo the cause is most likely something completely different.

Still happens with 4.9.0-0.bpo.1-amd64 on debian jessie with docker 1.13.1. Is there any kernel - os combination which is stable?

This might not be a purely docker issue - I'm getting it on a Proxmox server where I'm only running vanilla LXC containers (ubuntu 16.04).

@darth-veitcher it's a kernel issue

@thaJeztah agreed thanks. Was going to try and install 4.9.9 tonight from mainline and see if that fixes matters.

I'm getting it running Docker 1.13.1 on a Debian with kernel 4.9.9-040909.

Yes upgrading kernel on Proxmox to latest 4.9.9 didn't resolve the error. Strange as it's just appeared after a year without issues.

There might be something in a previous statement further up in the thread about it being linked to either NFS or CIFS shares mounted.

Sent from my iPhone

On 14 Feb 2017, at 07:47, Alfonso da Silva notifications@github.com wrote:

I'm getting it running Docker 1.13.1 on a Debian with kernel 4.9.9-040909.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

I have a bugzilla ticket open with Redhat about this.

Some developments:
Red Hat put the IPV6 refcount leak patches from mainline on QA, looks like they're queued up for RHEL 7.4 and may be backported to 7.3. Should be on CentOS-plus soon too. Note: This patch only fixes issues in SOME cases. If you have a 4.x kernel it's a moot point since they're already there.

This is definitely a race condition in the kernel from what I can tell, which makes it really annoying to find. I've taken a snapshot of the current mainline kernel and am working on instrumenting the various calls starting with the IPV6 subsystem. The issue is definitely reproducible now: looks like all you have to do is create a bunch of containers, push a ton of network traffic from them, crash the program inside the containers, and remove them. Doing this over and over triggers the issue in minutes, tops on a physical 4-core workstation.

Unfortunately, I don't have a lot of time to work on this: if there are kernel developers here who are willing to collaborate on instrumenting the necessary pieces I think we can set up a fork and start work on hunting this down step by step.

@iamthebot , is it reproducible on a qemu-kvm setup?

@iamthebot I have tried to repro this several times with different kernels. Somewhere above it was mentioned that using docker-stress -c 100 with userland-proxy set to false would trigger it, but I had no luck.

If you have a more reliable repro (even if it takes a long time to trigger) I can try and take a look

We encounter the same difficulty in our production and staging environments. We are going to upgrade to Docker 1.13 and Linux kernel 4.9 soon but as other already mentioned; these versions are also affected.

$ docker -v
Docker version 1.12.3, build 6b644ec

$ uname -a
Linux 4.7.0-0.bpo.1-amd64 #1 SMP Debian 4.7.8-1~bpo8+1 (2016-10-19) x86_64 GNU/Linux

I'm experiencing this issue pretty regularly on my dev system, always while shutting down containers.

General info

→ uname -a
Linux miriam 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

→ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 7.3 (Maipo)

→ docker -v 
Docker version 1.13.0, build 49bf474

→ docker-compose -v 
docker-compose version 1.10.0, build 4bd6f1a

→ docker info 
Containers: 11
 Running: 0
 Paused: 0
 Stopped: 11
Images: 143
Server Version: 1.13.0
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.6.1.el7.x86_64
Operating System: Red Hat Enterprise Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.19 GiB
Name: miriam
ID: QU56:66KP:C37M:LHXT:4ZMX:3DOB:2RUD:F2RR:JMNV:QCGZ:ZLWQ:6UO5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 16
 Goroutines: 25
 System Time: 2017-02-15T10:47:09.010477057-06:00
 EventsListeners: 0
Http Proxy: http://xxxxxxxxxxxxxxxxxxxx:80
Https Proxy: http://xxxxxxxxxxxxxxxxxxxx:80
No Proxy: xxxxxxxxxxxxxxxxxxxx
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false



Docker daemon log

DEBU[70855] Calling DELETE /v1.22/containers/9b3d01076f3b6a1373729e770a9b1b4e878c2e4be5e27376d24f21ffead6792f?force=False&link=False&v=False 
DEBU[70855] Calling DELETE /v1.22/containers/38446ddb58bc1148ea2fd394c5c14618198bcfca114dae5998a5026152da7848?force=False&link=False&v=False 
DEBU[70855] Calling DELETE /v1.22/containers/e0d31b24ea4d4649aec766c7ceb5270e79f5a74d60976e5894d767c0fb2af47a?force=False&link=False&v=False 
DEBU[70855] Calling DELETE /v1.22/networks/test_default  
DEBU[70855] Firewalld passthrough: ipv4, [-t nat -C POSTROUTING -s 172.19.0.0/16 ! -o br-ee4e6fb1c772 -j MASQUERADE] 
DEBU[70855] Firewalld passthrough: ipv4, [-t nat -D POSTROUTING -s 172.19.0.0/16 ! -o br-ee4e6fb1c772 -j MASQUERADE] 
DEBU[70855] Firewalld passthrough: ipv4, [-t nat -C DOCKER -i br-ee4e6fb1c772 -j RETURN] 
DEBU[70855] Firewalld passthrough: ipv4, [-t nat -D DOCKER -i br-ee4e6fb1c772 -j RETURN] 
DEBU[70855] Firewalld passthrough: ipv4, [-t filter -C FORWARD -i br-ee4e6fb1c772 -o br-ee4e6fb1c772 -j ACCEPT] 
DEBU[70855] Firewalld passthrough: ipv4, [-D FORWARD -i br-ee4e6fb1c772 -o br-ee4e6fb1c772 -j ACCEPT] 
DEBU[70855] Firewalld passthrough: ipv4, [-t filter -C FORWARD -i br-ee4e6fb1c772 ! -o br-ee4e6fb1c772 -j ACCEPT] 
DEBU[70855] Firewalld passthrough: ipv4, [-D FORWARD -i br-ee4e6fb1c772 ! -o br-ee4e6fb1c772 -j ACCEPT] 
DEBU[70855] Firewalld passthrough: ipv4, [-t filter -C FORWARD -o br-ee4e6fb1c772 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT] 
DEBU[70855] Firewalld passthrough: ipv4, [-D FORWARD -o br-ee4e6fb1c772 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C FORWARD -o br-ee4e6fb1c772 -j DOCKER] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C FORWARD -o br-ee4e6fb1c772 -j DOCKER] 
DEBU[70856] Firewalld passthrough: ipv4, [-D FORWARD -o br-ee4e6fb1c772 -j DOCKER] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C DOCKER-ISOLATION -i br-ee4e6fb1c772 -o docker0 -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-D DOCKER-ISOLATION -i br-ee4e6fb1c772 -o docker0 -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C DOCKER-ISOLATION -i docker0 -o br-ee4e6fb1c772 -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-D DOCKER-ISOLATION -i docker0 -o br-ee4e6fb1c772 -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C DOCKER-ISOLATION -i br-ee4e6fb1c772 -o br-b2210b5a8b9e -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-D DOCKER-ISOLATION -i br-ee4e6fb1c772 -o br-b2210b5a8b9e -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-t filter -C DOCKER-ISOLATION -i br-b2210b5a8b9e -o br-ee4e6fb1c772 -j DROP] 
DEBU[70856] Firewalld passthrough: ipv4, [-D DOCKER-ISOLATION -i br-b2210b5a8b9e -o br-ee4e6fb1c772 -j DROP] 
DEBU[70856] releasing IPv4 pools from network test_default (ee4e6fb1c772154fa35ad8d2c032299375bc2d7756b595200f089c2fbcc39834) 
DEBU[70856] ReleaseAddress(LocalDefault/172.19.0.0/16, 172.19.0.1) 
DEBU[70856] ReleasePool(LocalDefault/172.19.0.0/16)      

Message from syslogd@miriam at Feb 15 10:20:52 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@r-BenDoan if you try to stop a container but it doesn't respond to SIGINT, docker will wait 10 seconds and then kill the container ungracefully. I encountered that behavior in my nodejs containers until I added signal handling. If you see a container taking 10s to stop, it likely isn't handling signals and is more likely to trigger this issue.

Make sure your containers can stop gracefully.

While I'm not the one who is fixing this issue, not being much into Linux Kernel dev, I think I am right in saying that the "me too" comments aren't that helpful. By this I mean, just saying "I have this problem too, with Kernel vx.x and Docker 1.x" does not bring anything new to the discussion.

However, I would suggest that "me too" comments which describe more the environment and method to reproduce would be of great value.

When reading all the comments, it is clear that there are a few problems - as I posted earlier, some with vethXYZ, some with eth0 and others with lo0. This suggest that they could be caused by different problems. So just saying "me too" without full description of the error and environment may mislead people.

Also, when describing the environment, giving the Kernel and Docker version is not sufficient. Per the thread, there seems to be a few factors such as ipv6 enabled or not. NodeJS not responding to SIGINT (or other containers, not bashing on NodeJS here).

So describing what the workload on the environment is would be useful. Also, this occurs when a container is being shutdown, therefore I would also suggest to the people experiencing this issue to pay attention to what container is being stopped when the problem rear its ugly head.

While it seems the problem is in the Kernel having a race condition - identifying the trigger will be of tremendous help to those who will fix the issue. And it can even give the affected users an immediate solution such as implementing a signal handler in a NodeJS application (I don't know myself that this prevents the issue from triggering, but it seems so per earlier comments of others).

FWIW kubernetes has correlated this completely to veth "hairpin mode" and
has stopped using that feature completely. We have not experienced this
problem at all, across tens of thousands of production machines and vastly
more test runs, since changing.

Until this is fixed, abandon ship. Find a different solution :(

On Wed, Feb 15, 2017 at 10:00 AM, ETL notifications@github.com wrote:

While I'm not the one who is fixing this issue, not being much into Linux
Kernel dev, I think I am right in saying that the "me too" comments aren't
that helpful. By this I mean, just saying "I have this problem too, with
Kernel vx.x and Docker 1.x" does not bring anything new to the discussion.

However, I would suggest that "me too" comments which describe more the
environment and method to reproduce would be of great value.

When reading all the comments, it is clear that there are a few problems -
as I posted earlier, some with vethXYZ, some with eth0 and others with lo0.
This suggest that they could be caused by different problems. So just
saying "me too" without full description of the error and environment may
mislead people.

Also, when describing the environment, giving the Kernel and Docker
version is not sufficient. Per the thread, there seems to be a few factors
such as ipv6 enabled or not. NodeJS not responding to SIGINT (or other
containers, not bashing on NodeJS here).

So describing what the workload on the environment is would be useful.
Also, this occurs when a container is being shutdown, therefore I would
also suggest to the people experiencing this issue to pay attention to what
container is being stopped when the problem rear its ugly head.

While it seems the problem is in the Kernel having a race condition -
identifying the trigger will be of tremendous help to those who will fix
the issue. And it can even give the affected users an immediate solution
such as implementing a signal handler in a NodeJS application (I don't know
myself that this prevents the issue from triggering, but it seems so per
earlier comments of others).


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-280087293, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVFmu1SiStZcLKtKuk1W-tjn6wOXlks5rcz0hgaJpZM4B4L4Z
.

Yep, we are moving to gke and no longer seeing this issue (so no more bug bounty from us :))

Just had the error again. I was trying to fix a node.js application which uses sockets and therefore scaled the application often. The node.js app was build on top of https://github.com/deployd/deployd. I hope this provides some more info. What also was interesting is that both server inside my swarm displayed the unregister_netdevice error simultaneously after I removed the service via docker service rm. The container was scaled to 4 so two container were running on each machine.

edit Happened again! Working on the same node.js app. The last 3 or 4 days I haven't directly worked on that node.js application and it never occured.

edit2 will try to add signal handler to the nodejs app. Let's see if that helps....

I just ran in to this error, after using docker-py to publish a new instance to EC. However, I was able to exit with ctrl+C, and haven't seen it since (now that most of the images are building more quickly from the cache)

```{"status":"Pushed","progressDetail":{},"id":"c0962ea0b9bc"}
{"status":"stage: digest: sha256:f5c476a306f5c2558cb7c4a2fd252b5b186b65da22c8286208e496b3ce685de8 size: 5737"}
{"progressDetail":{},"aux":{"Tag":"stage","Digest":"sha256:f5c476a306f5c2558cb7c4a2fd252b5b186b65da22c8286208e496b3ce685de8","Size":5737}}

Docker image published successfully

Message from syslogd@ip-172-31-31-68 at Feb 16 19:49:16 ...
kernel:[1611081.976079] unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@ip-172-31-31-68 at Feb 16 19:49:27 ...
kernel:[1611092.220067] unregister_netdevice: waiting for lo to become free. Usage count = 1

[1]+ Stopped ./image-publish.py
[root@ip-172-31-xx-xx image-publish]# ^C
[root@ip-172-31-xx-xx image-publish]#

@thockin is this setting --hairpin-mode=none on the kubelets?

=none breaks containers that get NAT'ed back to themselves. We use
promiscuous-bridge by default.

On Thu, Feb 16, 2017 at 7:26 PM, Kanat Bekt notifications@github.com
wrote:

@thockin https://github.com/thockin is this setting --hairpin-mode=none
on the kubelets?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-280539673, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLNwAH6NWVaIKhJfS147O9w_rtJEks5rdRN8gaJpZM4B4L4Z
.

@thockin which containers might want to access themselves via Service ClusterIP ?

It turns out to be more common than I thought, and when we broke it, lots
of people complained.

On Feb 17, 2017 12:38 AM, "Maxim Ivanov" notifications@github.com wrote:

@thockin https://github.com/thockin which containers might want to
access themselves via Service ClusterIP ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-280588366, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLn3uBvUW-dQ72qst5_eYiFUtfSVks5rdVyIgaJpZM4B4L4Z
.

I think I know why some dockerized nodejs app could cause this issue. Node uses keep-alive connections per default. When server.close() is used, the server doesn't accept new connections. But current active connections like websockets or HTTP keep-alive connections are still maintained. When the dockerized app is also scaled to n this could result in waiting for lo to become free because when it is forced to termination lo newer was freed. When docker redistributes this app to another node or the app is scaled down docker sends a signal to the app that it should shutdown. The app listens to this signal and can react. When the app isn't shutdown after some seconds, docker terminates it without hesitation. I added signal handlers and found out that when using server.close() the server isn't perfectly terminated but "only" stops accepting new connections (see https://github.com/nodejs/node/issues/2642). So we need to make sure that open connections like websockets or http keep-alive is also closed.

How to handle websockets:
The nodejs app emits to all websockets closeSockets when a shutdown signal is received. The client listens on this closeSockets event and calls sockets.disconnect() and shortly after sockets.connect(). Remember that server.close() was called so this instance doesn't accept new requests. When other instances of this dockerized app is running the loadbalancer inside docker will eventually pick an instance which isn't shutdown and a successful connection is established. The instance which should shutdown won't have open websockets-connections.

var gracefulTermination = function(){
    //we don't want to kill everything without telling the clients that this instance stops
    //server.close() sets the server to a state on which he doesn't allow new connections
    //but the old connections (websockets) are still open and can be used
    server.close(function(){
        // this method is called when the server terminates
        console.log('close bknd');
            process.exit();
        });

        //iterate through all open websockets and emit 'closeSockets' to the clients.
    //Clients will then call disconnect() and connect() on their site to establish new connections
    //to other instances of this scaled app
    Object.keys(server.socketIoObj.sockets.sockets).forEach(function(id) {
        console.log("WebSocket ID:",id, " will be closed from the client.") 
        server.socketIoObj.to(id).emit('closeSockets');
    });

};
process.on( "SIGINT", function() {
    console.log('CLOSING [SIGINT]');
    gracefulTermination();
});
...

How to handle keep-alive HTTP connections:
Currently I don't know how this can be done perfectly. The easiest way is to disable keep-alive.

app.use(function (req, res, next) {
    res.setHeader('Connection', 'close');
    next();
}

Another possibility is to set the keep-alive timeout to a very low number. For example 0.5 seconds.

app.use(function (req, res, next) {
    res.setTimeout(500);
    next();
}

Hope this could help others :)

I've got same issues. Attachments are all of logs that made from ecs-logs-collector script.
Much appreciated for any help :)

collect.zip

I've got same issues.
Docker version 1.13.1, build 092cba3
Linux debian 4.8.6-x86_64-linode78

Linux backup 4.6.0-040600-generic #201606100558 SMP Fri Jun 10 10:01:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Server Version: 1.13.1

Same issue. I'm using mount in privileged container. After 4-5 runs it freezes. Also i have same issue with latest standard kernel for 16.04

Everyone, @etlweather is spot-on. Only post a "me too" if you have a reliable way of reproducing the issue. In that case, detail your procedure. A docker and kernel version isn't enough and we get lots of notifications about it. The simpler your reproduction procedure, the better.

@rneugeba @redbaron Unfortunately the current "repro" I have is very hardware specific (all but proving this is a race condition). I haven't tried getting a QEMU repro but that's definitely the next step so multiple people can actually work on this and get the expected result (ideally in 1 CPU core setup). If someone already has one, please shoot me an email (it's on my profile). I'll thoroughly test it and post it here.

We're getting this in GCE pretty frequently. Docker freezes and the machine hangs on reboot.

[782935.982038] unregister_netdevice: waiting for vethecf4912 to become free. Usage count = 17

The container is running a go application, and has hairpin nat configured.

Docker:

matthew@worker-1:~$ docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:38:45 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:38:45 2017
 OS/Arch:      linux/amd64

Ubuntu 16.04 LTS,

matthew@worker-1:~$ uname -a
Linux worker-1 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Does anyone have a suggested work around for this? I tried enabling --userland-proxy=true and docker still hangs after a while. It appears Kubernates has a solution from what @thockin wrote above, but its not clear what --hairpin-mode=promiscuous-bridge exactly does and how to configure that on a plain jane ubuntu 16.x docker install.

I can make this happen reliably when running Proxmox and using containers. Specifically, if I have moved a considerable amount of data or moved really any amount of data very recently, shutting down or hard stopping the container will produce this error. I've seen it most often when I am using containers that mount my NAS within, but that might be a coincidence.

# uname -a
Linux proxmox01 4.4.40-1-pve #1 SMP PVE 4.4.40-82 (Thu, 23 Feb 2017 15:14:06 +0100) x86_64 GNU/Linux

# cat /etc/debian_version
8.7

And from within Proxmox:

proxmox-ve: 4.4-82 (running kernel: 4.4.40-1-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.40-1-pve: 4.4.40-82
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-92
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-94
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-3
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80

It's worth noting that Docker is not installed on this system and never has been. I'm happy to provide any data the community needs to troubleshoot this issue, just tell me what commands to run.

I am able to reproduce this on centos 7.3 running as a swarm worker node running dtr with a mounted nfs volume mapped

If you are arriving here

The issue being discussed here is a kernel bug and has not yet been fixed. There are a number of options that may help for _some_ situations, but not for all (it's most likely a combination of issues that trigger the same error)

Do not leave "I have this too" comments

"I have this too" does not help resolving the bug. only leave a comment if you have information that may help resolve the issue (in which case; providing a patch to the kernel upstream may be the best step).

If you want to let know you have this issue too use the "thumbs up" button in the top description:
screen shot 2017-03-09 at 16 12 17

If you want to stay informed on updates use the _subscribe button_.

screen shot 2017-03-09 at 16 11 03

Every comment here sends an e-mail / notification to over 3000 people I don't want to lock the conversation on this issue, because it's not resolved yet, but may be forced to if you ignore this.

Thanks!

Thats all well and good but what _exactly_ are the options that help? This problem is causing us issues in production so I'd like to do whatever work arounds that are necessary to work around the kernel bug.

If someone from Docker has time to try the Kubernetes workaround, please
let me know and we can point you at it. I am unable to extract the changes
and patch them into Docker myself, right now.

On Thu, Mar 9, 2017 at 7:46 AM, Matthew Newhook notifications@github.com
wrote:

Thats all well and good but what exactly are the options that help?
This problem is causing us issues in production so I'd like to do whatever
work arounds that are necessary to work around the kernel bug.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-285388243, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGdH5VX_oFWkImyo_TvlIuhmwMepks5rkB7MgaJpZM4B4L4Z
.

@thockin thanks. I was following the PR/issue in Kubernetes with the hairpin-mode workaround. But during the many back and forth, I lost track if the workaround infact gets rid of this issue ?
(As I understand there are different scenarios that causes the ref-count inconsistency in the kernel).

If you can point me to the PR that you believe addresses the issue in K8s, I will work to get this patched in docker atleast for the case of turning userland-proxy off by default. (And we can test it using the docker-stress reproduction steps).

I'm not sure I have a single PR, but you can look at current state. Start
here:

https://github.com/kubernetes/kubernetes/blob/9a1f0574a4ad5813410b445330d7240cf266b322/pkg/kubelet/network/kubenet/kubenet_linux.go#L345

On Sat, Mar 11, 2017 at 10:49 PM, Madhu Venugopal notifications@github.com
wrote:

@thockin https://github.com/thockin thanks. I was following the
PR/issue in Kubernetes with the hairpin-mode workaround. But during the
many back and forth, I lost track if the workaround infact gets rid of this
issue ?
(As I understand there are different scenarios that causes the ref-count
inconsistency in the kernel).

If you can point me to the PR that you believe addresses the issue in K8s,
I will work to get this patched in docker atleast for the case of turning
userland-proxy off by default. (And we can test it using the docker-stress
reproduction steps).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/5618#issuecomment-285926217, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFVgVIlGs_QccxS6YYQiLNybddDzB4yUks5rk5VogaJpZM4B4L4Z
.

Hey all, just to be clear, all the "kubernetes workaround" does is enable promiscuous mode on the underlying bridge. You can achieve the same thing with ip link set <bridgename> promisc on using iproute2. It decreases the probability of running into the bug but may not eliminate it altogether.

Now, in theory this shouldn't work... but for some reason promiscuous mode seems to make the device teardown just slow enough that you don't get a race to decrement the ref counter. Perhaps one of the Kurbernetes contributors can chime in here if they're on this thread.

I can verify the workaround (NOT FIX) works using my environment-specific repro. I can't really verify it helps if you're using the IPVLAN or MACVLAN drivers (we use macvlan in prod) because it seems very difficult to get those setups to produce this bug. Can anyone else with a repro attempt to verify the workaround?

Hi all, I tried to debug the kernel issue, was having a email chain on the "netdev" mailing list, so just wanted to post some findings here.

https://www.spinics.net/lists/netdev/msg416310.html

The issue that we are seeing is that

unregister_netdevice: waiting for lo to become free. Usage count = 1

during container shut down. When if I inspect the container network namespace, it seems like the eth0 device has already been deleted, but only the lo device is left there. And there is another structure holding the reference for that device.

After some digging, it turns out the "thing" holding the reference, is one of the "routing cache" (struct dst_entry). And something is preventing that particular dst_entry to be gc'ed (the reference count for dst_entry is larger than 0). So I logged every dst_hold() (increment dst_entry reference count by 1), and dst_release() (decrement dst_entry reference count by 1), and there is indeed more dst_hold() calls then dst_release().

Here is the logs attached: kern.log.zip

Summary:

  • the lo interface was renamed to lodebug for ease of greping
  • the reference count for dst_entry starts with 1
  • the reference count for dst_entry (which is holding the reference for lo) at the end is 19
  • there are 258041 total dst_hold() calls, and 258023 total dst_release() calls
  • in the 258041 dst_hold() calls, there are 88034 udp_sk_rx_dst_set() (which is then calls dst_hold()), 152536 inet_sk_rx_dst_set(), and 17471 __sk_add_backlog()
  • in the 258023 dst_release() calls, there are 240551 inet_sock_destruct() and 17472 refdst_drop()

There are more udp_sk_rx_dst_set() and inet_sk_rx_dst_set() calls in total than inet_sock_destruct(), so suspecting there are some sockets are in a "limbo" state, and something preventing them to destroyed.

UPDATE:
Turns out sockets (struct sock) are created and destroyed correctly, but for some of the TCP sockets inet_sk_rx_dst_set() are being called multiple times on the same dst, but there is only one corresponding inet_sock_destruct() to release the reference to the dst.

Here is the CentOS 7.3 workaround that fixed it for me:

yum --enablerepo=centosplus install kernel-plus
egrep ^menuentry /etc/grub2.cfg | cut -f 2 -d \’
grub2-set-default 0
reboot

Here is the patch that solves it:
https://bugs.centos.org/view.php?id=12711&nbn=1

UPDATE: This turned out not to solve the problem permanently. It showed up again several hours later with the following wall message:
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@adrianotto - to clarify: Does the CentOS kernel patch resolve this? Just curious if you meant both your workaround and reference kernel path both did not successfully resolve this permanently?

@stayclassychicago @adrianotto That patch only addresses one of the race conditions that can trigger the "usage count" issue in the kernel. It's just my backported fix from something in the 4.x kernels already. It may solve your problems so it's worth a shot.

@stayclassychicago before I tried the 3.10.0-514.10.2.el7.centos.plus.x86_64 kernel I was getting the kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 very regularly, nearly every time I ran a container with docker run --rm ... when the container exited. After the kernel upgrade and reboot, it completely stopped for many hours, and then came back again. Now half the time I delete containers it works properly, where it used to error very time. I don't know for sure if the new kernel is helping, but it doesn't hurt.

Looks like it is very easy to reproduce when there is a LACP bonding interface on the machine. We have a 3 node swarm cluster, all 3 with a configured LACP bonding interface, and this issue basically doesn't allow us to work with the cluster. We have to restart nodes every 15-20 minutes.

Confirmed - as soon as I removed LACP bonding from the interfaces (those were used as main interfaces), everything is working fine for more than 12 hours. Used to break every ~30 minutes.

This is reproducible on Linux containerhost1 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 GNU/Linux with Docker version 17.04.0-ce, build 4845c56 running in priviliged mode when we have cifs mounts open. When the container stops with mounts open, Docker gets unresponsive and we get the kernel:[ 1129.675495] unregister_netdevice: waiting for lo to become free. Usage count = 1-error.

ubuntu 16.04(kernel 4.4.0-78-generic) still has the issue. And when it happens, any application that tries to create a new network namespace through clone syscall will get stuck

[ 3720.752954]  [<ffffffff8183c8f5>] schedule+0x35/0x80
[ 3720.752957]  [<ffffffff8183cb9e>] schedule_preempt_disabled+0xe/0x10
[ 3720.752961]  [<ffffffff8183e7d9>] __mutex_lock_slowpath+0xb9/0x130
[ 3720.752964]  [<ffffffff8183e86f>] mutex_lock+0x1f/0x30
[ 3720.752968]  [<ffffffff8172ba2e>] copy_net_ns+0x6e/0x120
[ 3720.752972]  [<ffffffff810a169b>] create_new_namespaces+0x11b/0x1d0
[ 3720.752975]  [<ffffffff810a17bd>] copy_namespaces+0x6d/0xa0
[ 3720.752980]  [<ffffffff8107f1d5>] copy_process+0x905/0x1b70
[ 3720.752984]  [<ffffffff810805d0>] _do_fork+0x80/0x360
[ 3720.752988]  [<ffffffff81080959>] SyS_clone+0x19/0x20
[ 3720.752992]  [<ffffffff81840a32>] entry_SYSCALL_64_fastpath+0x16/0x71

The only solution is to hard reset the machine.

I met this issue when mounting NFS volume in a privileged container then restarting the container.
It seems to me this issue never happened on RHEL 7 with the same procedure.

$ docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-6.gitae7d637.fc25.x86_64
 Go version:      go1.7.4
 Git commit:      ae7d637/1.12.6
 Built:           Mon Jan 30 16:15:28 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-6.gitae7d637.fc25.x86_64
 Go version:      go1.7.4
 Git commit:      ae7d637/1.12.6
 Built:           Mon Jan 30 16:15:28 2017
 OS/Arch:         linux/amd64

Red Hat claims to have an instance of this bug fixed as of kernel-3.10.0-514.21.1.el7 release. I suppose they will upstream the fix as soon as possible and rebase to 4.12. This package is already available on CentOS 7 as well.

Documentation related to the fix (RHN access needed):
https://access.redhat.com/articles/3034221
https://bugzilla.redhat.com/show_bug.cgi?id=1436588

From the article:
"In case of a duplicate IPv6 address or an issue with setting an address, a race condition occurred. This race condition sometimes caused address reference counting leak. Consequently, attempts to unregister a network device failed with the following error message: "unregister_netdevice: waiting for to become free. Usage count = 1". With this update, the underlying source code has been fixed, and network devices now unregister as expected in the described situation."

I already deployed this fix in all systems of our PaaS pool, and there's been already 2 days without the bug being hit. Earlier, we've had at least one system being frozen per day. I will report here if we hit the bug again.

I have kernel version 3.10.0-514.21.1.el7.x86_64, and I still have the same symptom.

Message from syslogd@docker at May 26 22:02:26 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
# uname -a
Linux docker 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# uptime
 22:03:10 up 35 min,  3 users,  load average: 0.16, 0.07, 0.06

@adrianotto Apparently, there are multiple ways to hit this issue. How did you reproduced your particular instance of this bug?

@bcdonadio If you look at https://git.centos.org/commitdiff/rpms!kernel.git/b777aca52781bc9b15328e8798726608933ceded - you will see that the https://bugzilla.redhat.com/show_bug.cgi?id=1436588 bug is "fixed" by this change:

+- [net] ipv6: addrconf: fix dev refcont leak when DAD failed (Hangbin Liu) [1436588 1416105]

Which is in upstream kernel since 4.8, I believe (https://github.com/torvalds/linux/commit/751eb6b6042a596b0080967c1a529a9fe98dac1d). And 4.9 and 4.10 has this bug present, so RedHat just backported some of the fixes from upstream, which probably fix some problems, but definitely not all of them.

@bcdonadio I can reproduce the bug on my system by running this test script once per hour from cron:

#!/bin/sh

TAG=`date +%F_%H_%M_%S_UTC`
docker pull centos:centos6
docker run --rm adrianotto/centos6 yum check-update -q > package_updates.txt
LINES=`wc -l < package_updates.txt`

if [ $LINES -eq 0 ] ; then
        rm -f package_updates.txt
        echo "No packages need to be updated"
        exit 0
fi

docker run --rm adrianotto/centos6 rpm -a -q > old_packages.txt
docker build -t temp:$TAG .
docker run --rm temp:$TAG rpm -a -q > new_packages.txt
docker rmi temp:$TAG

This script is just producing a package list using an image in the Docker registry, and another using one that's built locally so I can compare them. The Dockerfile is just this:

FROM centos:centos6
MAINTAINER Adrian Otto
RUN yum clean all && yum update -y && yum clean all

2-4 minutes later syslog gets this message:

Message from syslogd@docker at May 27 16:51:55 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 0

In the last occurrence happened a few minutes after I ran the script manually. My guess is that after some timeout elapses after the container delete is attempted, the error condition is raised.

I'm certain the error condition is intermittent, because the script above runs as a cron job at :00 past each error. Here is a sample of the error output that syslog recorded:

May 26 01:02:44 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 02:02:22 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 02:02:32 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 03:02:18 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 03:02:28 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 03:02:38 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 04:03:14 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 05:02:25 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 05:02:35 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 06:03:31 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 06:03:41 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 06:03:51 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 06:04:02 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1
May 26 09:03:04 docker kernel: unregister_netdevice: waiting for lo to become free. Usage count = 1

So it happens somewhere in the range of 2 to 4 minutes after the containers run and exit and are deleted by docker because of the --rm flag. Also notice from the log above that there is not an error for every container that's run/deleted, but it's pretty consistent.

Would it be possible for someone to see if this patch improves things?

https://patchwork.ozlabs.org/patch/768291/

@hlrichardson This actually looks like it! I will try to backport it to our 3.16 kernel or upgrade specific servers and compile kernel 4.9 with this patch tomorrow, we'll see how it goes.

Though, after checking the commit this patch references (https://github.com/torvalds/linux/commit/0c1d70af924b966cc71e9e48920b2b635441aa50) - it was committed in 4.6 kernel, while the problem was there even before :(

Ah, so perhaps not related, unless there are multiple causes (unfortunately there are many ways this type of bug can be triggered, so that is a possibility).

We personally hit at least multiple issues here - in some of them these
"unregister_netdevice" logs just disappear after some period of time and
docker containers are able to work fine, while in others - all containers
get stuck and the server need to be rebooted.

Actually - we don't use vxlan on servers that get these issues - we use
simple bridges and port forwarding (it happens regardless of userland-proxy
settings).

On May 30, 2017 22:54, "hlrichardson" notifications@github.com wrote:

Ah, so perhaps not related, unless there are multiple causes
(unfortunately there are many ways this type of bug can be triggered, so
that is a possibility).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/5618#issuecomment-304989068, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAGqoDHe1n3h9_eJ2kmeWcbhKRCX6rZoks5r_HPbgaJpZM4B4L4Z
.

OK, if you're not using vxlan tunnels it definitely won't help.

BTW, if you see a single instance of the "unregister_netdevice" message when a network namespace is deleted (container exit), it should be considered a normal situation in which something referencing a netdevice was cleaned up more or less at the same time the namespace
was being deleted.

The more serious case is where this message is repeated every 10 seconds and never ceases...
in this case a global lock is held forever, and since this lock has to be acquired whenever network
namespace is added or deleted, any attempt to create or delete a network namespace also
hangs forever.

If you have a fairly painless way to reproduce the second type of problem, I'd be interested in
taking a look.

@hlrichardson We're seeing the 2nd case you mention above on a bunch of our servers i.e. message repeated every 10 seconds. What info do you want me to share?

Seeing this on Fedora 25 while testing and building centos:7 containers while using yum. Yum failed to finish downloading the package database and hung indefinitely because the network stopped working in a weird way.

Hi guys,

There is a potential patch for the kernel bug (or at least one of the bugs) in the Linux net-dev mailing list:

https://www.spinics.net/lists/netdev/msg442211.html

It's merged in net tree, queued for stable tree.

According to https://github.com/torvalds/linux/commit/d747a7a51b00984127a88113cdbbc26f91e9d815 - it is in 4.12 (which was released yesterday)!

@fxposter @kevinxucs I'll try backporting this to the current CentOS kernel tomorrow.

I'm running 4.12 (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/) and I still hit this, so torvalds/linux@d747a7a must not be the complete fix.

$ uname -r
4.12.0-041200-generic

Ryan, do you have a reliable way to reproduce?

On 6 Jul 2017 4:29 pm, "Ryan Campbell" notifications@github.com wrote:

I'm running 4.12 (from http://kernel.ubuntu.com/~
kernel-ppa/mainline/v4.12/) and I still hit this, so torvalds/linux@
d747a7a https://github.com/torvalds/linux/commit/d747a7a must not be
the complete fix.

$ uname -r
4.12.0-041200-generic


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/5618#issuecomment-313413120, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAdcPCbVPDjPw6va-N5dM7CjYn2W4Bodks5sLO9ZgaJpZM4B4L4Z
.

@justincormack Unfortunately I don't have a minimal example that I can share, but we have a test suite that creates and destroys a lot of containers and i usually run into this issue (hanging docker commands, a lot of waiting for lo to become free in syslog) after only a few iterations.

@campbellr I've been trying to repro this now three times and spent a good part of this week on it with little luck. I managed to get the waiting for lo to become free messages a couple of times but without crashes/hangs afterwards. I'm trying to reduce the test case by just create network namespace and veth interfaces.

In your test suite:

  • do your containers have a lot of network activity? If so, which direction is predominant?
  • What sort of machine are you running this one (number of cores, is it a VM, etc)
  • Do you create a lot of containers concurrently?
  • Do your containers exit normally or do they crash?

Even partial answers to the above may help to narrow it down...
thanks

@rn Docker won't hang anymore as it sets a timeout on the netlink request that would normally hang. But you wouldn't be able to start new containers (or restart existing ones), likely container cleanup on stop would be weird as well.

I haven't had a chance to test on 4.12 yet, but I could reproduce reliably on the kvm instances at vultr. I'm running swarm and my headless chrome workers cause the problems when they fail health checks or crash regularly. Of course at this point I've tracked down all the crashers handle network errors cleanly etc so i'm seeing waiting for lo to become free but not often enough to hang things for a couple weeks.

So it seems like the things that help reproduce are more complex networking scenarios combined with large amounts of traffic into the containers, constant container recycling and kvm.

@rn I managed to narrow this down to a specific container in our test suite, and was able to reproduce with the following steps:

  1. start container (an internal tornado-based web service -- im trying to extract out a minimal example that still hits this)
  2. make a request to web service running in container
  3. wait for response
  4. kill container

After 3 or 4 iterations of this i end up getting waiting for lo to become free and on the next iteration docker run fails with docker: Error response from daemon: containerd: container did not start before the specified timeout.

do your containers have a lot of network activity? If so, which direction is predominant?

A pretty small amount. In the steps mentioned above, the http request is a small amount of json, and the response is a binary blob thats around 10MB.

What sort of machine are you running this one (number of cores, is it a VM, etc)

This is on a 4-core desktop machine (no VM)

Do you create a lot of containers concurrently?

No, everything is done serially.

Do your containers exit normally or do they crash?

They're stopped with docker stop

  1. start container (an internal tornado-based web service -- im trying to extract out a minimal example that still hits this)
  2. make a request to web service running in container
  3. wait for response
  4. kill container

I spent some time stripping the container down and it turns out that the web service had nothing to do with the bug. What seems to trigger this in my case is mounting an NFS share inside a container (running with --privileged).

On my desktop, i can reliably reproduce simply running the following a few times:

$ docker run -it --rm --privileged alpine:latest /bin/mount -o nolock -o async -o vers=3 <my-nfs-server>:/export/foo /mnt

Kubernetes users, I opened an issue to _kops_ to release the next Kubernetes AMI with Kernel version 4.12. Welcome to check it out: https://github.com/kubernetes/kops/issues/2901

I also hit this on centos 7.3 with host kernel 3.10.0-514.6.1.el7.x86_64 and docker-ce-17.06.0.ce-1.el7.centos.x86_64.

@FrankYu thats not helpful. To participate usefully in this thread, please provide an exact way to reproduce this issue, and please test on a modern kernel. 3.10 was released four years ago, we are discussing about whether it is fixed or partially on a release from four days ago.

@danielgusmao our RancherOS and AWS ECS AMI linux OS's already have that 'fix' in place (likely was the default) and it does not resolve the issue for us. We still see the message show up in logs all the time. Likely only hope is the kernel patch gets backported widely. Though I searched around and can't see any evidence of serious progress towards that yet in RedHat/Centos/AWS linux bugzillas and forums.

To be clear, the message itself is benign, it's the kernel crash after the messages reported by the OP which is not.

The comment in the code, where this message is coming from, explains what's happening. Basically every user, such as the IP stack) of a network device (such as the end of veth pair inside a container) increments a reference count in the network device structure when it is using the network device. When the device is removed (e,g. when the container is removed) each user is notified so that they can do some cleanup (e.g. closing open sockets etc) before decrementing the reference count. Because this cleanup can take some time, especially under heavy load (lot's of interface, a lot of connections etc), the kernel may print the message here once in a while.

If a user of network device never decrements the reference count, some other part of the kernel will determine that the task waiting for the cleanup is stuck and it will crash. It is only this crash which indicates a kernel bug (some user, via some code path, did not decrement the reference count). There have been several such bugs and they have been fixed in modern kernel (and possibly back ported to older ones). I have written quite a few stress tests (and continue writing them) to trigger such crashes but have not been able to reproduce on modern kernels (i do however the above message).

Please only report on this issue if your kernel actually crashes, and then we would be very interested in:

  • kernel version (output of uname -r)
  • Linux distribution/version
  • Are you on the latest kernel version of your Linux vendor?
  • Network setup (bridge, overlay, IPv4, IPv6, etc)
  • Description of the workload (what type of containers, what type of network load, etc)
  • And ideally a simple reproduction

Thanks

[ @thaJeztah could you change the title to something like kernel crash after "unregister_netdevice: waiting for lo to become free. Usage count = 3" to make it more explicit]

@drweber you will also find this patch in upcoming stable releases (for now 4.11.12, 4.9.39, 4.4.78, 3.18.62)

@rn

If a user of network device never decrements the reference count, some other part of the kernel will determine that the task waiting for the cleanup is stuck and it will crash. It is only this crash which indicates a kernel bug (some user, via some code path, did not decrement the reference count). There have been several such bugs and they have been fixed in modern kernel (and possibly back ported to older ones). I have written quite a few stress tests (and continue writing them) to trigger such crashes but have not been able to reproduce on modern kernels (i do however the above message).

Please only report on this issue if your kernel actually crashes ...

We are having a slightly different issue in our environment that I am hoping to get some clarification on (kernel 3.16.0-77-generic, Ubuntu 14.04, docker 1.12.3-0~trusty. We have thousands of hosts running docker, 2-3 containers per host, and we are seeing this on < 1% of total hosts running docker).

We actually never see the kernel crash, but instead (like the original reporters as far as I can tell) the dockerd process is defunct. Upstart (using the /etc/init/docker.conf job from the upstream package) will not start a new dockerd process because it thinks it is already running (start: Job is already running: docker), and attempting to stop the upstart job also fails (docker start/killed, process <pid of defunct process>).

$ ps -ely
S   UID   PID  PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
...
Z     0 28107     1  0  80   0     0     0 -      ?        00:18:05 dockerd <defunct>

Since we mostly run with bridge networking (on a custom bridge device) in dmesg we see a slightly different message referring to the virtual interface:

[7895942.484851] unregister_netdevice: waiting for vethb40dfbc to become free. Usage count = 1
[7895952.564852] unregister_netdevice: waiting for vethb40dfbc to become free. Usage count = 1
[7895962.656984] unregister_netdevice: waiting for vethb40dfbc to become free. Usage count = 1

Because upstart seems to refuse to restart dockerd or recognize that the previously running process is a zombie, the only solution we have found is to reboot the host.

While our outcome seems different (the kernel does not crash) the root cause sounds the same or similar. Is this not the same issue then? Is there any known workaround or way to have the docker upstart job become runnable again when this occurs?

@campbellr I can reproduce this issue with your approach on kernel 4.12.2-1.
BTW, if I unmount the NFS storage before the container is stopped, this issue will not happen.

same problem.

[root@docker1 ~]# cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
[root@docker1 ~]# uname  -r
3.10.0-514.26.2.el7.x86_64
[root@docker1 ~]# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      88a4867/1.12.6
 Built:           Mon Jul  3 16:02:02 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      88a4867/1.12.6
 Built:           Mon Jul  3 16:02:02 2017
 OS/Arch:         linux/amd64

Hi,

I've just created 2 repos https://github.com/piec/docker-samba-loop and https://github.com/piec/docker-nfs-loop that contain the necessary setup in order to reproduce this bug

My results:

  • 4.11.3-1-ARCH (Arch Linux) kernel: I generate the bug with docker-samba-loop in a few iterations (<10). I can't reproduce it with docker-nfs-loop
  • 4.11.9-1-ARCH same results (versions)
  • 4.12.3-1-ARCH (testing repo) same results
  • 4.11.11-coreos: same results for docker-samba-loop, didn't try docker-nfs-loop

Hope this helps
Cheers

A workaround is to use --net=host in my case. But it's not always an acceptable solution

@piec, many thanks for the details. I have a few more questions for you at the end of this very long comment.

Using the SMB setup I was able to produce a number of things with different kernels. I've tried this with the NFS setup as well but no dice.

All tests are run with docker 17.06.1-ce on HyperKit with a VM configured with 2 vCPUs and 2GB of memory (via Docker for Mac, but that should not matter). I'm using LinuxKit kernels, because I can easily swap them out.

I modified your Dockerfile in that I added a call to date as the first command executed and also added a call to date before andeafter the docker run for the client.

Experiment 1 (4.9.39 kernel)

With 4.9.39 (latest 4.9.x stable kernel) I get a kernel crash:

# while true; do  date;    docker run -it --rm --name client-smb --cap-add=SYS_ADMIN --cap-add DAC_READ_SEARCH --link samba:samba client-smb:1;  date;   sleep 1; done
Thu 27 Jul 2017 14:12:51 BST
+ date
Thu Jul 27 13:12:52 UTC 2017
+ mount.cifs //172.17.0.2/public /mnt/ -o vers=3.0,user=nobody,password=
+ date
Thu Jul 27 13:12:52 UTC 2017
+ ls -la /mnt
total 1028
drwxr-xr-x    2 root     root             0 Jul 27 10:11 .
drwxr-xr-x    1 root     root          4096 Jul 27 13:12 ..
-rwxr-xr-x    1 root     root             3 Jul 27 10:11 bla
+ umount /mnt
+ echo umount ok
umount ok
Thu 27 Jul 2017 14:12:52 BST
Thu 27 Jul 2017 14:12:53 BST

---> First iteration suceeds and then hangs on the docker run

and in dmesg:

[  268.347598] BUG: unable to handle kernel paging request at 0000000100000015
[  268.348072] IP: [<ffffffff8c64ea95>] sk_filter_uncharge+0x5/0x31
[  268.348411] PGD 0 [  268.348517]
[  268.348614] Oops: 0000 [#1] SMP
[  268.348789] Modules linked in:
[  268.348971] CPU: 1 PID: 2221 Comm: vsudd Not tainted 4.9.39-linuxkit #1
[  268.349330] Hardware name:   BHYVE, BIOS 1.00 03/14/2014
[  268.349620] task: ffff8b6ab8eb5100 task.stack: ffffa015c113c000
[  268.349995] RIP: 0010:[<ffffffff8c64ea95>]  [<ffffffff8c64ea95>] sk_filter_uncharge+0x5/0x31
[  268.350509] RSP: 0018:ffffa015c113fe10  EFLAGS: 00010202
[  268.350818] RAX: 0000000000000000 RBX: ffff8b6ab7eee6a8 RCX: 0000000000000006
[  268.351231] RDX: 00000000ffffffff RSI: 00000000fffffffd RDI: ffff8b6ab7eee400
[  268.351636] RBP: ffff8b6ab7eee400 R08: 0000000000000000 R09: 0000000000000000
[  268.352022] R10: ffffa015c101fcb0 R11: 0000000000000000 R12: 0000000000000000
[  268.352409] R13: ffff8b6ab7eee4a8 R14: ffff8b6ab7f8e340 R15: 0000000000000000
[  268.352796] FS:  00007f03f62e3eb0(0000) GS:ffff8b6abc700000(0000) knlGS:0000000000000000
[  268.353234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  268.353546] CR2: 0000000100000015 CR3: 00000000782d2000 CR4: 00000000000406a0
[  268.353961] Stack:
[  268.354106]  ffffffff8c625054 ffff8b6ab7eee400 ffffa015c113fe88 0000000000000000
[  268.354526]  ffffffff8c74ed96 01000008bc718980 0000000000000000 0000000000000000
[  268.354965]  de66927a28223151 ffff8b6ab4443a40 ffffa015c101fcb0 ffff8b6ab4443a70
[  268.355384] Call Trace:
[  268.355523]  [<ffffffff8c625054>] ? __sk_destruct+0x35/0x133
[  268.355822]  [<ffffffff8c74ed96>] ? unix_release_sock+0x1df/0x212
[  268.356164]  [<ffffffff8c74ede2>] ? unix_release+0x19/0x25
[  268.356454]  [<ffffffff8c62034c>] ? sock_release+0x1a/0x6c
[  268.356742]  [<ffffffff8c6203ac>] ? sock_close+0xe/0x11
[  268.357019]  [<ffffffff8c1f8710>] ? __fput+0xdd/0x17b
[  268.357288]  [<ffffffff8c0f604d>] ? task_work_run+0x64/0x7a
[  268.357583]  [<ffffffff8c003285>] ? prepare_exit_to_usermode+0x7d/0xa4
[  268.357925]  [<ffffffff8c7d2884>] ? entry_SYSCALL_64_fastpath+0xa7/0xa9
[  268.358268] Code: 08 4c 89 e7 e8 fb f8 ff ff 48 3d 00 f0 ff ff 77 06 48 89 45 00 31 c0 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00 <48> 8b 46 18 8b 40 04 48 8d 04 c5 28 00 00 00 f0 29 87 24 01 00
[  268.359776] RIP  [<ffffffff8c64ea95>] sk_filter_uncharge+0x5/0x31
[  268.360118]  RSP <ffffa015c113fe10>
[  268.360311] CR2: 0000000100000015
[  268.360550] ---[ end trace 4a7830b42d5acfb3 ]---
[  268.360861] Kernel panic - not syncing: Fatal exception
[  268.361217] Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  268.361789] Rebooting in 120 seconds..

Some times I see several iteration of what the 4.11.12 kernel does, including the unregister_netdevice messages (see below) and then get the kernel crash above. Sometimes I see a slight variations for the crash, like:

[  715.926694] BUG: unable to handle kernel paging request at 00000000fffffdc9
[  715.927380] IP: [<ffffffff8664ea95>] sk_filter_uncharge+0x5/0x31
[  715.927868] PGD 0 [  715.928022]
[  715.928174] Oops: 0000 [#1] SMP
[  715.928424] Modules linked in:
[  715.928703] CPU: 0 PID: 2665 Comm: runc:[0:PARENT] Not tainted 4.9.39-linuxkit #1
[  715.929321] Hardware name:   BHYVE, BIOS 1.00 03/14/2014
[  715.929765] task: ffff931538ef4140 task.stack: ffffbcbbc0214000
[  715.930279] RIP: 0010:[<ffffffff8664ea95>]  [<ffffffff8664ea95>] sk_filter_uncharge+0x5/0x31
[  715.931043] RSP: 0018:ffffbcbbc0217be0  EFLAGS: 00010206
[  715.931487] RAX: 0000000000000000 RBX: ffff931532a662a8 RCX: 0000000000000006
[  715.932043] RDX: 00000000ffffffff RSI: 00000000fffffdb1 RDI: ffff931532a66000
[  715.932612] RBP: ffff931532a66000 R08: 0000000000000000 R09: 0000000000000000
[  715.933181] R10: ffff9315394f2990 R11: 000000000001bb68 R12: ffff931532a66000
[  715.933725] R13: ffff9315328060a8 R14: ffff931532a66340 R15: 0000000000000000
[  715.934258] FS:  0000000000000000(0000) GS:ffff93153c600000(0000) knlGS:0000000000000000
[  715.934857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  715.935286] CR2: 00000000fffffdc9 CR3: 0000000052c09000 CR4: 00000000000406b0
[  715.935822] Stack:
[  715.935974]  ffffffff86625054 ffff931532806000 ffffbcbbc0217c58 ffff931532a66000
[  715.936560]  ffffffff8674ed37 0100000800000282 0000000000000000 0000000000000000
[  715.937173]  5de0b9a3a313c00b ffff9315346f5080 ffff9315394f2990 ffff9315346f50b0
[  715.937751] Call Trace:
[  715.937982]  [<ffffffff86625054>] ? __sk_destruct+0x35/0x133
[  715.938608]  [<ffffffff8674ed37>] ? unix_release_sock+0x180/0x212
[  715.939130]  [<ffffffff8674ede2>] ? unix_release+0x19/0x25
[  715.939517]  [<ffffffff8662034c>] ? sock_release+0x1a/0x6c
[  715.939907]  [<ffffffff866203ac>] ? sock_close+0xe/0x11
[  715.940277]  [<ffffffff861f8710>] ? __fput+0xdd/0x17b
[  715.940635]  [<ffffffff860f604d>] ? task_work_run+0x64/0x7a
[  715.941072]  [<ffffffff860e148a>] ? do_exit+0x42a/0x8e0
[  715.941472]  [<ffffffff8674edfa>] ? scm_destroy+0xc/0x25
[  715.941880]  [<ffffffff867504e0>] ? unix_stream_sendmsg+0x2dd/0x30b
[  715.942357]  [<ffffffff860e19aa>] ? do_group_exit+0x3c/0x9d
[  715.942780]  [<ffffffff860eac41>] ? get_signal+0x45d/0x4e2
[  715.943210]  [<ffffffff86621640>] ? sock_sendmsg+0x2d/0x3c
[  715.943618]  [<ffffffff8602055a>] ? do_signal+0x36/0x4c9
[  715.944017]  [<ffffffff861f64c7>] ? __vfs_write+0x8f/0xcc
[  715.944416]  [<ffffffff861f7100>] ? vfs_write+0xbb/0xc7
[  715.944809]  [<ffffffff8600326c>] ? prepare_exit_to_usermode+0x64/0xa4
[  715.945295]  [<ffffffff867d2884>] ? entry_SYSCALL_64_fastpath+0xa7/0xa9
[  715.945789] Code: 08 4c 89 e7 e8 fb f8 ff ff 48 3d 00 f0 ff ff 77 06 48 89 45 00 31 c0 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00 <48> 8b 46 18 8b 40 04 48 8d 04 c5 28 00 00 00 f0 29 87 24 01 00
[  715.947701] RIP  [<ffffffff8664ea95>] sk_filter_uncharge+0x5/0x31
[  715.948112]  RSP <ffffbcbbc0217be0>
[  715.948292] CR2: 00000000fffffdc9
[  715.948467] ---[ end trace 2d69bea56725fd5f ]---
[  715.948722] Kernel panic - not syncing: Fatal exception
[  715.949059] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  715.949595] Rebooting in 120 seconds..

The crashes are in the unix domain socket code and similar/identical to what is
reported here, though with this new test case it is much easier to reproduce.

Experiment 2 (4.11.12 kernel)

With 4.11.12 (which is the latest stable in the 4.11 series) I see no crashes, but it is really slow (annotations inline with --->):

# while true; do  date;    docker run -it --rm --name client-smb --cap-add=SYS_ADMIN --cap-add DAC_READ_SEARCH --link samba:samba client-smb:1;  date;   sleep 1; done
Thu 27 Jul 2017 13:48:04 BST
+ date
Thu Jul 27 12:48:05 UTC 2017
+ mount.cifs //172.17.0.2/public /mnt/ -o vers=3.0,user=nobody,password=
+ date
Thu Jul 27 12:48:05 UTC 2017
+ ls -la /mnt
total 1028
drwxr-xr-x    2 root     root             0 Jul 27 10:11 .
drwxr-xr-x    1 root     root          4096 Jul 27 12:48 ..
-rwxr-xr-x    1 root     root             3 Jul 27 10:11 bla
+ umount /mnt
+ echo umount ok
umount ok
Thu 27 Jul 2017 13:48:05 BST

---> First iteration takes one second

Thu 27 Jul 2017 13:48:06 BST
docker: Error response from daemon: containerd: container did not start before the specified timeout.
Thu 27 Jul 2017 13:50:07 BST

---> Second iteration fails after 2 minutes with dockerd unable to start the container

Thu 27 Jul 2017 13:50:08 BST
+ date
Thu Jul 27 12:51:52 UTC 2017
+ mount.cifs //172.17.0.2/public /mnt/ -o vers=3.0,user=nobody,password=
+ date
Thu Jul 27 12:51:53 UTC 2017
+ ls -la /mnt
total 1028
drwxr-xr-x    2 root     root             0 Jul 27 10:11 .
drwxr-xr-x    1 root     root          4096 Jul 27 12:50 ..
-rwxr-xr-x    1 root     root             3 Jul 27 10:11 bla
+ umount /mnt
+ echo umount ok
umount ok
Thu 27 Jul 2017 13:51:53 BST

---> Third iterations succeeds, BUT it takes almost 2 minutes between docker run and the container running

Thu 27 Jul 2017 13:51:54 BST
docker: Error response from daemon: containerd: container did not start before the specified timeout.
Thu 27 Jul 2017 13:53:55 BST

---> Fourth iteration fails after two minutes

Thu 27 Jul 2017 13:53:56 BST
+ date
Thu Jul 27 12:55:37 UTC 2017
+ mount.cifs //172.17.0.2/public /mnt/ -o vers=3.0,user=nobody,password=
+ date
Thu Jul 27 12:55:37 UTC 2017
+ ls -la /mnt
total 1028
drwxr-xr-x    2 root     root             0 Jul 27 10:11 .
drwxr-xr-x    1 root     root          4096 Jul 27 12:53 ..
-rwxr-xr-x    1 root     root             3 Jul 27 10:11 bla
+ umount /mnt
+ echo umount ok
umount ok
Thu 27 Jul 2017 13:55:38 BST

---> Fifth iteration succeeds, but almost 2 minutes between docker run and the container executing

I had this running for an hour or so with the same pattern repeating, but no kernel crash.

I the kernel logs I see, lots of:

[   84.940380] unregister_netdevice: waiting for lo to become free. Usage count = 1
[   95.082151] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  105.253289] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  115.477095] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  125.627059] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  135.789298] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  145.969455] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  156.101126] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  166.303333] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  176.445791] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  186.675958] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  196.870265] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  206.998238] unregister_netdevice: waiting for lo to become free. Usage count = 1
[...]

That is a message every ten seconds.

Since this does not cause the hung task detection to kick in even after an hour, I suspect that with 4.11.12 the reference count eventually gets decremented and the device get's freed, but, judging by the intervals I can run containers, it might take up to 4mins!

Experiment 3 (4.11.12 kernel)

The kernel crash in the OP indicated that the kernel crashed because a hung task was detected. I haven not seen this crash in my testing, so I changed the sysctl setting related to hung task detection:

# sysctl -a | grep kernel.hung_task
kernel.hung_task_check_count = 4194304
kernel.hung_task_panic = 0
kernel.hung_task_timeout_secs = 120
kernel.hung_task_warnings = 10
# sysctl -w kernel.hung_task_timeout_secs = 60
# sysctl -w kernel.hung_task_panic=1

This reduces the timeout to 60 seconds and panics the kernel if a hung task was detected. Since it takes around 2 minutes before dockerd complained that containerd did not start, reducing the hung task detection to 60s ought to trigger a kernel panics if a single task was hung. Alas there was no crash in the logs

Experiment 4 (4.11.12 kernel)

Next, I increase the sleep after each docker run to 5 minutes to see if the messages are continuous. In this case all docker runs seem to work, which is kinda expected since from the previous experiments a docker run would work every 4 minutes or so

---> This is after the first run
[  281.406660] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  291.455945] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  301.721340] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  311.988572] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  322.258805] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  332.527383] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  342.796511] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  353.059499] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  363.327472] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  373.365562] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  383.635923] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  393.684949] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  403.950186] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  414.221779] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  424.490110] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  434.754925] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  445.022243] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  455.292106] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  465.557462] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  475.826946] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  486.097833] unregister_netdevice: waiting for lo to become free. Usage count = 1

---> 200+ seconds of messages and then nothing for almost 400 seconds

[  883.924399] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  893.975810] unregister_netdevice: waiting for lo to become free. Usage count = 1
...
[ 1088.624065] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1098.891297] unregister_netdevice: waiting for lo to become free. Usage count = 1

---> 200+ seconds of messages and then a gap of 90 seconds

[ 1185.119327] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1195.387962] unregister_netdevice: waiting for lo to become free. Usage count = 1
...
[ 1390.040035] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1400.307359] unregister_netdevice: waiting for lo to become free. Usage count = 1

---> 200+ seconds of messages and then a gap of 80+ seconds

[ 1486.325724] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1496.591715] unregister_netdevice: waiting for lo to become free. Usage count = 1
...
[ 1680.987216] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1691.255068] unregister_netdevice: waiting for lo to become free. Usage count = 1

---> 200+ seconds of messages and then a gap of 90+ seconds

[ 1787.547334] unregister_netdevice: waiting for lo to become free. Usage count = 1
[ 1797.819703] unregister_netdevice: waiting for lo to become free. Usage count = 1

It looks like we are getting around 200 seconds worth of unregister_netdevice on almost every docker run (except for the second one). I suspect during that time we can't start new containers (as indicated by Experiment 2. It's curious that the hung task detection is not kicking in, presumably because no task is hung.

Experiment 5 (4.11.12/4.9.39 with a extra debugging enable in the kernel)

This is reverting back to 1s sleep in between docker run

We have another kernel which enabled a bunch of additional debug
options, such as LOCKDEP, RCU_TRACE, LOCKUP_DETECTOR and a few
more.

Running the repro 4.11.12 kernels with these debug options enabled did not trigger anything.

Ditto for the 4.9.39 kernel, where the normal kernel crashes. The debug options change the timing slightly, so this maybe an additional clue that the crash in the unix domain socket code shows about is due to a race.

Digging a bit deeper

strace on the various containerd processes is not helpful (it
usually isn't because it's written in Go). Lots of long stalls in
futex(...FUTEX_WAIT...) with any information on where/why.

Some poking around with sysrq:

Increase verbosity:

echo 9 > /proc/sysrq-trigger

Stack trace from all CPUs:

echo l > /proc/sysrq-trigger
[ 1034.298202] sysrq: SysRq : Show backtrace of all active CPUs
[ 1034.298738] NMI backtrace for cpu 1
[ 1034.299073] CPU: 1 PID: 2235 Comm: sh Tainted: G    B           4.11.12-linuxkit #1
[ 1034.299818] Hardware name:   BHYVE, BIOS 1.00 03/14/2014
[ 1034.300286] Call Trace:
[ 1034.300517]  dump_stack+0x82/0xb8
[ 1034.300827]  nmi_cpu_backtrace+0x75/0x87
[ 1034.301200]  ? irq_force_complete_move+0xf1/0xf1
[ 1034.301633]  nmi_trigger_cpumask_backtrace+0x6e/0xfd
[ 1034.302097]  arch_trigger_cpumask_backtrace+0x19/0x1b
[ 1034.302560]  ? arch_trigger_cpumask_backtrace+0x19/0x1b
[ 1034.302989]  sysrq_handle_showallcpus+0x17/0x19
[ 1034.303438]  __handle_sysrq+0xe4/0x172
[ 1034.303826]  write_sysrq_trigger+0x47/0x4f
[ 1034.304210]  proc_reg_write+0x5d/0x76
[ 1034.304507]  __vfs_write+0x35/0xc8
[ 1034.304773]  ? rcu_sync_lockdep_assert+0x12/0x52
[ 1034.305132]  ? __sb_start_write+0x152/0x189
[ 1034.305458]  ? file_start_write+0x27/0x29
[ 1034.305770]  vfs_write+0xda/0x100
[ 1034.306029]  SyS_write+0x5f/0xa3
[ 1034.306283]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[ 1034.306638] RIP: 0033:0x7fa4810488a9
[ 1034.306976] RSP: 002b:00007fffd3a29828 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1034.307567] RAX: ffffffffffffffda RBX: 000000c6b523a020 RCX: 00007fa4810488a9
[ 1034.308101] RDX: 0000000000000002 RSI: 000000c6b5239d00 RDI: 0000000000000001
[ 1034.308635] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1034.309169] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 1034.309700] R13: 0000000000000000 R14: 00007fffd3a29988 R15: 00007fa481280ee0
[ 1034.310334] Sending NMI from CPU 1 to CPUs 0:
[ 1034.310710] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffa0922756

Nothing here, CPU1 is idle, CPU0 is handling the sysrq.

Show blocked tasks (twice)

echo w > /proc/sysrq-trigger
[  467.167062] sysrq: SysRq : Show Blocked State
[  467.167731]   task                        PC stack   pid father
[  467.168580] kworker/u4:6    D    0   293      2 0x00000000
[  467.169096] Workqueue: netns cleanup_net
[  467.169487] Call Trace:
[  467.169732]  __schedule+0x582/0x701
[  467.170073]  schedule+0x89/0x9a
[  467.170338]  schedule_timeout+0xbf/0xff
[  467.170666]  ? del_timer_sync+0xc1/0xc1
[  467.171011]  schedule_timeout_uninterruptible+0x2a/0x2c
[  467.171422]  ? schedule_timeout_uninterruptible+0x2a/0x2c
[  467.171866]  msleep+0x1e/0x22
[  467.172155]  netdev_run_todo+0x173/0x2c4
[  467.172499]  rtnl_unlock+0xe/0x10
[  467.172770]  default_device_exit_batch+0x13c/0x15f
[  467.173226]  ? __wake_up_sync+0x12/0x12
[  467.173550]  ops_exit_list+0x29/0x53
[  467.173850]  cleanup_net+0x1a8/0x261
[  467.174153]  process_one_work+0x276/0x4fb
[  467.174487]  worker_thread+0x1eb/0x2ca
[  467.174800]  ? rescuer_thread+0x2d9/0x2d9
[  467.175136]  kthread+0x106/0x10e
[  467.175406]  ? __list_del_entry+0x22/0x22
[  467.175737]  ret_from_fork+0x2a/0x40
[  467.176167] runc:[1:CHILD]  D    0  2609   2606 0x00000000
[  467.176636] Call Trace:
[  467.176849]  __schedule+0x582/0x701
[  467.177152]  schedule+0x89/0x9a
[  467.177451]  schedule_preempt_disabled+0x15/0x1e
[  467.177827]  __mutex_lock+0x2a0/0x3ef
[  467.178133]  ? copy_net_ns+0xbb/0x17c
[  467.178456]  mutex_lock_killable_nested+0x1b/0x1d
[  467.179068]  ? mutex_lock_killable_nested+0x1b/0x1d
[  467.179489]  copy_net_ns+0xbb/0x17c
[  467.179798]  create_new_namespaces+0x12b/0x19b
[  467.180151]  unshare_nsproxy_namespaces+0x8f/0xaf
[  467.180569]  SyS_unshare+0x17b/0x302
[  467.180925]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  467.181303] RIP: 0033:0x737b97
[  467.181559] RSP: 002b:00007fff1965ab18 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[  467.182182] RAX: ffffffffffffffda RBX: 0000000002277bd8 RCX: 0000000000737b97
[  467.182805] RDX: 0000000000000000 RSI: 0000000000867a0f RDI: 000000006c020000
[  467.183368] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  467.184014] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  467.184639] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  477.286653] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  487.457828] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  497.659654] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  507.831614] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  518.030241] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  528.232963] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  538.412263] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  548.583610] unregister_netdevice: waiting for lo to become free. Usage count = 1
echo w > /proc/sysrq-trigger
[  553.969592] sysrq: SysRq : Show Blocked State
[  553.970411]   task                        PC stack   pid father
[  553.971208] kworker/u4:6    D    0   293      2 0x00000000
[  553.971686] Workqueue: netns cleanup_net
[  553.972058] Call Trace:
[  553.972305]  __schedule+0x582/0x701
[  553.972690]  schedule+0x89/0x9a
[  553.973039]  schedule_timeout+0xbf/0xff
[  553.973462]  ? del_timer_sync+0xc1/0xc1
[  553.973890]  schedule_timeout_uninterruptible+0x2a/0x2c
[  553.974706]  ? schedule_timeout_uninterruptible+0x2a/0x2c
[  553.975244]  msleep+0x1e/0x22
[  553.975539]  netdev_run_todo+0x173/0x2c4
[  553.975950]  rtnl_unlock+0xe/0x10
[  553.976303]  default_device_exit_batch+0x13c/0x15f
[  553.976725]  ? __wake_up_sync+0x12/0x12
[  553.977121]  ops_exit_list+0x29/0x53
[  553.977501]  cleanup_net+0x1a8/0x261
[  553.977869]  process_one_work+0x276/0x4fb
[  553.978245]  worker_thread+0x1eb/0x2ca
[  553.978578]  ? rescuer_thread+0x2d9/0x2d9
[  553.978933]  kthread+0x106/0x10e
[  553.979283]  ? __list_del_entry+0x22/0x22
[  553.979774]  ret_from_fork+0x2a/0x40
[  553.980244] runc:[1:CHILD]  D    0  2609   2606 0x00000000
[  553.980728] Call Trace:
[  553.980949]  __schedule+0x582/0x701
[  553.981254]  schedule+0x89/0x9a
[  553.981533]  schedule_preempt_disabled+0x15/0x1e
[  553.981917]  __mutex_lock+0x2a0/0x3ef
[  553.982220]  ? copy_net_ns+0xbb/0x17c
[  553.982524]  mutex_lock_killable_nested+0x1b/0x1d
[  553.982909]  ? mutex_lock_killable_nested+0x1b/0x1d
[  553.983311]  copy_net_ns+0xbb/0x17c
[  553.983606]  create_new_namespaces+0x12b/0x19b
[  553.983977]  unshare_nsproxy_namespaces+0x8f/0xaf
[  553.984363]  SyS_unshare+0x17b/0x302
[  553.984663]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  553.985080] RIP: 0033:0x737b97
[  553.985306] RSP: 002b:00007fff1965ab18 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[  553.985861] RAX: ffffffffffffffda RBX: 0000000002277bd8 RCX: 0000000000737b97
[  553.986383] RDX: 0000000000000000 RSI: 0000000000867a0f RDI: 000000006c020000
[  553.986811] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  553.987182] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  553.987551] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
/ # [  558.844761] unregister_netdevice: waiting for lo to become free. Usage count = 1

This shows that both the netns and cleanup_net work queues are busy. I found a somewhat related issue a quite a while back here, but this time the cleanup_net workqueue is in a different state.

Summary

  • I think the crash on 4.9.x kernels is unrelated but seems to be fixed in 4.11.x. This needs a bit more triage.
  • Unlike some previous reports, no hung tasks are reported, but it's hard to tell because there are very few stack traces on this issue.
  • Something is blocked for a very long time (2-4 minutes). It's likely related to the workqueue state
  • The dump of the workqueue needs more analysis, in particualr why the workqueue stay in that state that long.
  • The unregister_netdev messages seem unrelated to the recent fix (which is in both 4.9.39 and 4.11.12). This maybe because the cleanup_net work queue is not progressing and thus the message is printed.
  • It's entirely unclear how/why SMB is triggering this. I've written some 60 odd stress tests for network namespaces and different workloads and was unable to trigger any issues. The tests were based on runc, maybe I should try containerd.

I will dig a bit more and then send a summary to netdev.

@piec do you have console access and can see if there is anything in terms of crash dump or do you also just see huge delays as I see? If you have a crash dump, I'd be very interested in seeing it. Also, are you running on bare metal or in a VM? What's your configuration in terms of CPUs and memory?

@rn thanks for the investigations!

I'm running on a baremetal desktop PC so I have access to everything. It's an i7-4790K + 32 GiB.
Currently I'm running on an up-to-date Arch Linux + kernel from the testing repo (4.12.3-1-ARCH)

In my case everything behaves as you describe in your Experiment 2 (4.11.12 kernel):

  • After the client-smb container exists running new containers is impossible for 4+ minutes.
  • I never have kernel crashes
  • The unregister_netdevice: waiting for lo to become free. Usage count = 1 message appears repeatedly if I try to run any new container in the 4+ minutes delay after the client-smb has exited. And only appears if you run a new container in that 4 minutes time lapse. Running a new container after these 4 minutes will be "normal"

So I suppose there's an issue somewhere in the clean up process of the smb-client container related to network interfaces

There is actually a much simpler repro of this issue (which, BTW is not the original issue).

This script just starts a SMB server on the host and then creates a network namespace with a veth pair, executes mount; ls; unmount in the network name space and then removes the network namespace.

apk add --no-cache iproute2 samba samba-common-tools cifs-utils

# SMB server setup
cat <<EOF > /etc/samba/smb.conf
[global]
    workgroup = WORKGROUP
    netbios name = FOO
    passdb backend = tdbsam
    security = user
    guest account = nobody
    strict locking = no
    min protocol = SMB2
[public]
    path = /share
    browsable = yes
    read only = no
    guest ok = yes
    browseable = yes
    create mask = 777
EOF
adduser -D -G nobody nobody && smbpasswd -a -n nobody
mkdir /share && chmod ugo+rwx /share && touch /share/foo
chown -R nobody.nobody /share

# Bring up a veth pair
ip link add hdev type veth peer name nsdev
ip addr add 10.0.0.1/24 dev hdev
ip link set hdev up

# Start SMB server and sleep for it to serve
smbd -D; sleep 5

# Client setup
ip netns add client-ns
ip link set nsdev netns client-ns
ip netns exec client-ns ip addr add 10.0.0.2/24 dev nsdev
ip netns exec client-ns ip link set lo up
ip netns exec client-ns ip link set nsdev up
sleep 1 # wait for the devices to come up

# Execute (mount, ls, unmount) in the network namespace and a new mount namespace
ip netns exec client-ns unshare --mount \
    /bin/sh -c 'mount.cifs //10.0.0.1/public /mnt -o vers=3.0,guest; ls /mnt; umount /mnt'

# Delete the client network namespace.
ip netns del client-ns

# Create a new network namespace
# This will stall for up to 200s
ip netns add new-netns

Note adding a simple sleep 1 after the unmount, either when executing in the namespace or before deleting the network namespace works without stalling at all when creating the new namespace. A sleep after the old namespace is deleted, does not reduce the stalling.

@piec I also tested this with your repro and a sleep 1 in the Dockerfile after the unmount and everything works as expected, no stalling, no unregister_netdev messages.

I'll write this up now and send to netdev@vger

Excellent
I confirm that a sleep after unmounting fixes stalling and unregister_netdev messages in my setup as well

Don't you think umount generates an asynchronous action relative to its netns which will block and eventually timeout if the netns is removed before that action finishes? A sleep after the mount would let this stuff finish before the netns is removed.
But that's just a hypothesis

I tried without the unmount, same difference. It's the deletion of the network namespace. That 9and the removal of the mount namespace will trigger the unmount anyway.

Ah ok

By the way I reproduced the issue by mistake (while developing) on another machine with smb again. It's an Ubuntu 16.04 PC, Linux 4.4.0-77-generic. And there's a hung task backtrace which might be interesting. No crash and same ~4 minutes delay.

[6409720.564230] device vethff6396b entered promiscuous mode
[6409720.564415] IPv6: ADDRCONF(NETDEV_UP): vethff6396b: link is not ready
[6409723.844595] unregister_netdevice: waiting for lo to become free. Usage count = 1
[6409726.812872] INFO: task exe:17732 blocked for more than 120 seconds.
[6409726.812918]       Tainted: P           O    4.4.0-77-generic #98-Ubuntu
[6409726.812959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[6409726.813007] exe             D ffff8809952bbcb8     0 17732      1 0x00000000
[6409726.813013]  ffff8809952bbcb8 ffffffff821d9a20 ffff88103856c600 ffff880ffae2d400
[6409726.813018]  ffff8809952bc000 ffffffff81ef7724 ffff880ffae2d400 00000000ffffffff
[6409726.813021]  ffffffff81ef7728 ffff8809952bbcd0 ffffffff81837845 ffffffff81ef7720
[6409726.813025] Call Trace:
[6409726.813036]  [<ffffffff81837845>] schedule+0x35/0x80
[6409726.813040]  [<ffffffff81837aee>] schedule_preempt_disabled+0xe/0x10
[6409726.813044]  [<ffffffff81839729>] __mutex_lock_slowpath+0xb9/0x130
[6409726.813048]  [<ffffffff818397bf>] mutex_lock+0x1f/0x30
[6409726.813053]  [<ffffffff81726a2e>] copy_net_ns+0x6e/0x120
[6409726.813059]  [<ffffffff810a168b>] create_new_namespaces+0x11b/0x1d0
[6409726.813062]  [<ffffffff810a17ad>] copy_namespaces+0x6d/0xa0
[6409726.813068]  [<ffffffff8107f1d5>] copy_process+0x905/0x1b70
[6409726.813073]  [<ffffffff810805d0>] _do_fork+0x80/0x360
[6409726.813077]  [<ffffffff81080959>] SyS_clone+0x19/0x20
[6409726.813081]  [<ffffffff8183b972>] entry_SYSCALL_64_fastpath+0x16/0x71
[6409733.941041] unregister_netdevice: waiting for lo to become free. Usage count = 1
[6409744.021494] unregister_netdevice: waiting for lo to become free. Usage count = 1

The netdev@vger thread is here https://www.mail-archive.com/[email protected]/msg179703.html if anyone wants to follow progress.

@piec yes, that's expected.

I also run into this bug and was able to reproduce Oopses with the docker-samba-loop method from @piec on the Ubuntu kernel images:

  • linux-image-4.4.0-93-generic
  • linux-image-4.10.0-1004-gcp
  • linux-image-4.10.0-32-generic
  • linux-image-4.11.0-14-generic
  • linux-image-4.12.10-041210-generic=4.12.10-041210.20170830

I added my findings to the Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 and https://github.com/fho/docker-samba-loop

@fho thanks. You actually don't need docker at all to repro, just running the samba client in a network namespace will do the trick as per https://github.com/moby/moby/issues/5618#issuecomment-318681443

@rn thanks for the info. I haven't tried that way yet.

The recent posts here and and on the netdev mailinglist seem to be only about kernel stalls.
I'm having kernel crashes also with kernel 4.11 and 4.12.

I'm seeing an issue very similar to this (as detailed in #35068). We basically run a two-node swarm, which runs a single service with 4 replicas using a spread placement strategy.

In each of these service containers we mount the host docker.sock as a volume, and from within the container we execute docker run commands, with a max concurrency of 4 per container. This results in up to 4 containers being created concurrently, and immediately removed after via -rm.

Additional kernel logs and examples on ARMv7 shown in the above reference.

ip6_route_dev_notify panic is a serious problem for us.

I think looking at this a bit more, this is definitely NOT the same bug as:

I think this is an issue upstream in the kernel with the ipv6 layer.

This information might be relevant.

We are able to reproduce the problem with _unregister_netdevice: waiting for lo to become free. Usage count = 1_ with 4.14.0-rc3 kernel with _CONFIG_PREEMPT_NONE=y_ and running only on one CPU with following boot kernel options:

BOOT_IMAGE=/boot/vmlinuz-4.14.0-rc3 root=/dev/mapper/vg0-root ro quiet vsyscall=emulate nosmp

Once we hit this state, it stays in this state and reboot is needed. No more containers can be spawned. We reproduce it by running images doing ipsec/openvpn connections + downloading a small file inside the tunnels. Then the instances exist (usually they run < 10s). We run 10s of such containers a minute on one machine. With the abovementioned settings (only 1cpu), the machine hits it in ~2 hours.

Another reproducer with the same kernel, but without limiting number of CPUs, is to jus run iperf in UDP mode for 3 seconds inside the container (so there is no TCP communication at all). If we run 10 of such containers in parallel, wait for all of them to finish and do it again, we hit the trouble in less than 10 minutes (on 40 cores machine).

In both of our reproducers, we added "ip route flush table all; ifconfig down; sleep 10" before existing from containers. It does not seem to have any effect.

Hi,

Just to add to the fire we are also seeing this problem, as requested here are the following...

Kernel Version: Linux exe-v3-worker 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64 GNU/Linux

Linux distribution/version: Debian 9.1 (with all packages up to date)

Are you on the latest kernel version of your Linux vendor? Yes

Network setup (bridge, overlay, IPv4, IPv6, etc): IPv4 only, NATed as per default Docker setup

Description of the workload (what type of containers, what type of network load, etc): Very short lived containers (from a few seconds to a few minutes) running scripts before exiting.

And ideally a simple reproduction:

**kernel:[617624.412100] unregister_netdevice: waiting for lo to become free. Usage count = 1

Couldn't kill old container or start new ones on the nodes affected, had to reboot to restore functionality.**

Hopefully we find a root cause / patch soon.

Best Regards,

robputt796

@campbellr
agreed that it seems have something to do with networking storage. I'm using ceph krbd as persistent volume in kubernetes.
And I can reproduce the situation after a long time running container crash.

The issue was assigned 10 days ago and it is work in progress, you can see more insights of what's going on here https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407

Hopefully Dan Streetman finds out how to fix it

Turns out that the Oops is caused by a kernel bug which has been fixed by commit 76da0704507bbc51875013f6557877ab308cfd0a:

ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76da0704507bbc51875013f6557877ab308cfd0a
(It merely fixes the kernel panic, not the "kernel:unregister_netdevice: waiting for lo to become free. Usage count = 2" issue.)

(repeating this here again, because GitHub is hiding old comments)

If you are arriving here

The issue being discussed here is a kernel bug and has not yet been fully fixed. Some patches went in the kernel that fix _some_ occurrences of this issue, but others are not yet resolved.

There are a number of options that may help for _some_ situations, but not for all (again; it's most likely a combination of issues that trigger the same error)

Do not leave "I have this too" comments

"I have this too" does not help resolving the bug. only leave a comment if you have information that may help resolve the issue (in which case; providing a patch to the kernel upstream may be the best step).

If you want to let know you have this issue too use the "thumbs up" button in the top description:
screen shot 2017-03-09 at 16 12 17

If you want to stay informed on updates use the _subscribe button_.

screen shot 2017-03-09 at 16 11 03

Every comment here sends an e-mail / notification to over 3000 people I don't want to lock the conversation on this issue, because it's not resolved yet, but may be forced to if you ignore this.

I will be removing comments that don't add useful information in order to (slightly) shorten the thread

If you want to help resolving this issue

  • Read the whole thread; it's long, and github hides comments (so you'll have to click to make those visible again). There's a lot if information present in this thread already that could possibly help you
  • Read this comment https://github.com/moby/moby/issues/5618#issuecomment-316297818 (and comments around that time) for information that can be helpful.

Thanks!

I believe I've fixed this issue, at least when caused by a kernel TCP socket connection. Test kernels for Ubuntu are available and I would love feedback if they help/fix this for anyone here. Patch is submitted upstream; more details are in LP bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/46

Sorry to spoil the celebration, but we were able to reproduce the issue. We are now working with @ddstreet on it at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/ .

Are there no workarounds?

Use host networking (which destroys much of the value of containers, but there you go).

@pumba-lt We had this issue about 1.5yrs ago, about 1yr ago I disabled ipv6 at the kernel level (not sysctl) and haven't had the issue once. Running a cluster of 48 blades.

Normally in: /etc/default/grub
GRUB_CMDLINE_LINUX="xxxxx ipv6.disable=1"

However, I use PXE boot, so my PXE config has:

      DEFAULT menu.c32
      prompt 0
      timeout 50
      MENU TITLE PXE Boot
      label coreos
              menu label CoreOS
              kernel mykernel
              append initrd=myimage ipv6.disable=1 elevator=deadline cloud-config-url=myurl

I assure you, you will not see this issue again.

Everyone please understand this is a common SYMPTOM that has many causes. What has worked for you to avoid this may not work for someone else.

I can confirm our issues were solved after disabling IPv6 upon boot (fron grub's config file). Had numerous issues in a 7 node cluster, runs smoothly now.

I don't remember where I found the solution, or did I find it myself, anyway, thanks @qrpike for suggesting this to others :) !!

https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114

commit edaafa805e0f9d09560a4892790b8e19cab8bf09
Author: Dan Streetman ddstreet@ieee.org
Date: Thu Jan 18 16:14:26 2018 -0500

net: tcp: close sock if net namespace is exiting


[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

still happend "unregister_netdevice: waiting for eth0 to become free. Usage count = 1" although I‘v upgraded kernel version to 4.4.118, and docker version 17.09.1-ce ,maybe I should try disable ipv6 at the kernel level . Hope it cloud work.

@wuming5569 please let me know if it worked for you with that version of linux

@wuming5569 maybe, upgrade kernel 4.4.114 fix "unregister_netdevice: waiting for lo to become free. Usage count = 1", not for "unregister_netdevice: waiting for eth0 to become free. Usage count = 1".
I tested in production.
@ddstreet this is a feedback, any help ?

@wuming5569 as mentioned above, the messages them self are benign but they may eventually lead to the kernel hanging. Does your kernel hang and if so, what is your network pattern, ie what type of networking do your containers do?

Experienced same issue on CentOS. My kernel is 3.10.0-693.17.1.el7.x86_64. But, I didn't get similar stack trace in syslog.

Same on Centos7 kernel 3.10.0-514.21.1.el7.x86_64 and docker 18.03.0-ce

@danielefranceschi I recommend you upgrade to the latest CentOS kernel (at least 3.10.0-693). It won't solve the issue, but it seems to be much less frequent. In kernels 3.10.0-327 and 3.10.0-514, we were seeing the stack trace, but by my memory, I don't think we've seen any of those in 3.10.0-693.

@alexhexabeam 3.10.0-693 seems to work flawlessy, tnx :)

Same on CentOS7 kernel 4.16.0-1.el7.elrepo.x86_64 and docker 18.03.0-ce

It worked for weeks before the crash and when to try to up, it completely stuck.

The problem also happened with kernel 3.10.0-693.21.1.el7

I can confirm it also happens on:

Linux 3.10.0-693.17.1.el7.x86_64
Red Hat Enterprise Linux Server release 7.4 (Maipo)

I can reproduce it by doing "service docker restart" while having a certain amount of load.

@wuming5569 have you fixed this issue?what's your network type ? we have been confused by this issue for weeks .
Do you have wechat account ?

4admin2root, given the fix you mentioned, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114,

is it safe to disable userland proxy for docker daemon, if proper recent kernel is installed? It is not very clear if it is from

https://github.com/moby/moby/issues/8356
https://github.com/moby/moby/issues/11185

Since both are older than the kernel fix

Thank you

we have been confused by this issue for weeks .
Linux 3.10.0-693.17.1.el7.x86_64
CentOS Linux release 7.4.1708 (Core)

Can anyone confirm if the latest 4.14 kernel has this issue? Seems like it does not. No one around the Internet faced this issue with the 4.14 kernel.

I see this in 4.15.15-1 kernel, Centos7

Looking at the change logs, https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.15.8 has a fix for SCTP, but not TCP. So you may like to try the latest 4.14.

  • even 4.15.18 does not help with this bug
  • disabling ipv6 does not help as well

we have now upgraded to 4.16.13. Observing. This bug was hitting us on a one node only approx once per week.

Did you disable ipv6 in grub boot params or sysctl? Only boot params will work. Sysctl will not fix it.

On June 4, 2018 at 12:09:53 PM, Sergey Pronin ([email protected](mailto:[email protected])) wrote:

even 4.15.18 does not help with this bug
disabling ipv6 does not help as well

we have now upgraded to 4.16.13. Observing. This bug was hitting us on a one node only approx once per week.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub(https://github.com/moby/moby/issues/5618#issuecomment-394410321), or mute the thread(https://github.com/notifications/unsubscribe-auth/AAo3HLYI_jnwjgtQ0ce-E4mc6Em5yeISks5t5VvRgaJpZM4B4L4Z).

for me, most of the time the bug shows up after redeploying the same project/network again

@qrpike you are right, we tried only sysctl. Let me try with grub. Thanks!

4.9.88 Debian kernel. Reproducible.

@qrpike you are right, we tried only sysctl. Let me try with grub. Thanks!

In my case disabling ipv6 didn't make any difference.

@spronin-aurea Did disabling ipv6 at boot loader help?

@qrpike can you tell us about the nodes you are using if disabling ipv6 helped in your case? Kernel version, k8s version, CNI, docker version etc.

@komljen I have been using CoreOS for the past 2years without a single incident. Since ~ver 1000. I haven't tried it recently but if I do not disable ipv6 the bug happens.

On my side, I'm using CoreOS too, ipv6 disabled with grub and still getting the issue

@deimosfr I'm currently using PXE boot for all my nodes:

      DEFAULT menu.c32
      prompt 0
      timeout 50
      MENU TITLE PXE Boot Blade 1
      label coreos
              menu label CoreOS ( blade 1 )
              kernel coreos/coreos_production_pxe.vmlinuz
              append initrd=coreos/coreos_production_pxe_image.cpio.gz ipv6.disable=1 net.ifnames=1 biosdevname=0 elevator=deadline cloud-config-url=http://HOST_PRIV_IP:8888/coreos-cloud-config.yml?host=1 root=LABEL=ROOT rootflags=noatime,discard,rw,seclabel,nodiratime

However, my main node that is the PXE host is also CoreOS and boots from disk, and does not have the issue either.

What kernel versions you guys are running?

The ones I got the issue were on 4.14.32-coreos and before. I do not encounter this issue yet on 4.14.42-coreos

Centos 7.5 with 4.17.3-1 kernel, still got the issue.

Env :
kubernetes 1.10.4
Docker 13.1
with Flannel network plugin.

Log :
[ 89.790907] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 89.798523] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 89.799623] cni0: port 8(vethb8a93c6f) entered blocking state
[ 89.800547] cni0: port 8(vethb8a93c6f) entered disabled state
[ 89.801471] device vethb8a93c6f entered promiscuous mode
[ 89.802323] cni0: port 8(vethb8a93c6f) entered blocking state
[ 89.803200] cni0: port 8(vethb8a93c6f) entered forwarding state

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1。

Now :
The node IP can reach, but cannot use any network services , like ssh...

The symptoms here are similar to a lot of reports in various other places. All having to do with network namespaces. Could the people running into this please see if unshare -n hangs, and if so, from another terminal, do cat /proc/$pid/stack of the unshare process to see if it hangs in copy_net_ns()? This seems to be a common denominator for many of the issues including some backtraces found here. Between 4.16 and 4.18 there have been a number of patches by Kirill Tkhai refactoring the involved locking a lot. The affected distro/kernel package maintainers should probably look into applying/backporting them to stable kernels and see if that helps.
See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779678

@Blub

sudo cat /proc/122355/stack
[<ffffffff8157f6e2>] copy_net_ns+0xa2/0x180
[<ffffffff810b7519>] create_new_namespaces+0xf9/0x180
[<ffffffff810b775a>] unshare_nsproxy_namespaces+0x5a/0xc0
[<ffffffff81088983>] SyS_unshare+0x193/0x300
[<ffffffff816b8c6b>] tracesys+0x97/0xbd
[<ffffffffffffffff>] 0xffffffffffffffff

Given the locking changes in 4.18 it would be good to test the current 4.18rc, especially if you can trigger it more or less reliably, as from what I've seen there are many people where changing kernel versions also changed the likelihood of this happening a lot.

I had this issues with Kubernetes and after switching to latest CoreOS stable release - 1745.7.0 the issue is gone:

  • kernel: 4.14.48
  • docker: 18.03.1

same issue on CentOS 7

  • kernel: 4.11.1-1.el7.elrepo.x86_64
  • docker: 17.12.0-ce

@Blub Seeing the same on CoreOS 1688.5.3, kernel 4.14.32

ip-10-72-101-86 core # cat /proc/59515/stack
[<ffffffff9a4df14e>] copy_net_ns+0xae/0x200
[<ffffffff9a09519c>] create_new_namespaces+0x11c/0x1b0
[<ffffffff9a0953a9>] unshare_nsproxy_namespaces+0x59/0xb0
[<ffffffff9a07418d>] SyS_unshare+0x1ed/0x3b0
[<ffffffff9a003977>] do_syscall_64+0x67/0x120
[<ffffffff9a800081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<ffffffffffffffff>] 0xffffffffffffffff

In theory there may be one or more other traces somewhere containing one of the functions from net_namespace.c locking the net_mutex (cleanup_net, net_ns_barrier, net_ns_init, {,un}register_pernet_{subsys,device}). For stable kernels it would of course be much easier if there was one particular thing deadlocking in a way that could be fixed, than backporting all the locking changes from 4.18. But so far I haven't seen a trace leading to the root cause. I don't know if it'll help, but maybe other /proc/*/stacks with the above functions are visible when the issue appears?

same issue ! and my env is debian 8
debian-docker
docker

RHEL, SWARM, 18.03.0-ce

  1. Connecting to manager node via ssh
  2. Manually starting a container on a manager node:

    sudo docker run -it -v /import:/temp/eximport -v /home/myUser:/temp/exhome docker.repo.myHost/fedora:23 /bin/bash

  3. After some time doing nothing:

    [root@8a9857c25919 myDir]#
    Message from syslogd@se1-shub-t002 at Jul 19 11:56:03 ...
    kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

After minutes I am back on the console of the manager node and the started container is not running any longer.

Does this describe the same issue or is this another "problem suite"?

THX in advance!

UPDATE
This also happens directly on the ssh console (on the swarm manager bash).

UPDATE
Host machine (one manager node in the swarm):
Linux [MACHINENNAME] 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

If this does not fix after some time, then this's a different problem.

Same on CentOS7.5 kernel 3.10.0-693.el7.x86_64 and docker 1.13.1

The same problem OEL 7.5
uname -a
4.1.12-124.16.1.el7uek.x86_64 #2 SMP Mon Jun 11 20:09:51 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
docker info
Containers: 9
Running: 5
Paused: 0
Stopped: 4
Images: 6
Server Version: 17.06.2-ol

dmesg
[2238374.718889] unregister_netdevice: waiting for lo to become free. Usage count = 1
[2238384.762813] unregister_netdevice: waiting for lo to become free. Usage count = 1
[2238392.792585] eth0: renamed from vethbed6d59

(repeating this https://github.com/moby/moby/issues/5618#issuecomment-351942943 here again, because GitHub is hiding old comments)

If you are arriving here

The issue being discussed here is a kernel bug and has not yet been fully fixed. Some patches went in the kernel that fix _some_ occurrences of this issue, but others are not yet resolved.

There are a number of options that may help for _some_ situations, but not for all (again; it's most likely a combination of issues that trigger the same error)

The "unregister_netdevice: waiting for lo to become free" error itself is not the bug

If's the kernel crash _after_ that's a bug (see below)

Do not leave "I have this too" comments

"I have this too" does not help resolving the bug. only leave a comment if you have information that may help resolve the issue (in which case; providing a patch to the kernel upstream may be the best step).

If you want to let know you have this issue too use the "thumbs up" button in the top description:
screen shot 2017-03-09 at 16 12 17

If you want to stay informed on updates use the _subscribe button_.

screen shot 2017-03-09 at 16 11 03

Every comment here sends an e-mail / notification to over 3000 people I don't want to lock the conversation on this issue, because it's not resolved yet, but may be forced to if you ignore this.

I will be removing comments that don't add useful information in order to (slightly) shorten the thread

If you want to help resolving this issue

  • Read the whole thread, including those comments that are hidden; it's long, and github hides comments (so you'll have to click to make those visible again). There's a lot if information present in this thread already that could possibly help you

screen shot 2018-07-25 at 15 18 14

To be clear, the message itself is benign, it's the kernel crash after the messages reported by the OP which is not.

The comment in the code, where this message is coming from, explains what's happening. Basically every user, such as the IP stack) of a network device (such as the end of veth pair inside a container) increments a reference count in the network device structure when it is using the network device. When the device is removed (e,g. when the container is removed) each user is notified so that they can do some cleanup (e.g. closing open sockets etc) before decrementing the reference count. Because this cleanup can take some time, especially under heavy load (lot's of interface, a lot of connections etc), the kernel may print the message here once in a while.

If a user of network device never decrements the reference count, some other part of the kernel will determine that the task waiting for the cleanup is stuck and it will crash. It is only this crash which indicates a kernel bug (some user, via some code path, did not decrement the reference count). There have been several such bugs and they have been fixed in modern kernel (and possibly back ported to older ones). I have written quite a few stress tests (and continue writing them) to trigger such crashes but have not been able to reproduce on modern kernels (i do however the above message).

* Please only report on this issue if your kernel actually crashes*, and then we would be very interested in:

  • kernel version (output of uname -r)
  • Linux distribution/version
  • Are you on the latest kernel version of your Linux vendor?
  • Network setup (bridge, overlay, IPv4, IPv6, etc)
  • Description of the workload (what type of containers, what type of network load, etc)
  • And ideally a simple reproduction

Thanks!

Are you guys running docker under any limits? Like ulimits, cgroups etc...

newer systemd has a default limit even if you didnt set it. I set things to unlimited and the issue hasn't occurred ever since (watching since 31 days).

I had the same issue in many environments and my solution was stop firewall. It has not happened again, for now

Rhel 7.5 - 3.10.0-862.3.2.el7.x86_64
Docker 1.13

@dElogics What version of systemd is considered "newer"? Is this default limit enabled in the CentOS 7.5 systemd?

Also, when you ask if we're running docker under any limits, do you mean the docker daemon, or the individual containers?

The docker daemon. The systemd as in Debian 9 (232-25).

Not sure about RHEL, but I've personally seen this issue on RHEL too. I'd set LimitNOFILE=1048576, LimitNPROC=infinity, LimitCORE=infinity, TasksMax=infinity

kernel: unregister_netdevice: waiting for eth0 to become free. Usage count = 3
kernel 4.4.146-1.el7.elrepo.x86_64
linux version CentOS Linux release 7.4.1708 (Core)
bridge mode

I had the same issue,what can i do?

Same issue:

CentOS Linux release 7.5.1804 (Core)
Docker version 18.06.1-ce, build e68fc7a
Kernel Version: 3.10.0-693.el7.x86_64

The similar issue I've met here...
Is there any moves I could perform right now? Please help me out...

CentOS 7.0.1406
[root@zjsm-slavexx etc]# uname -a
Linux zjsm-slave08 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@zjsm-slavexx etc]# cat /etc/centos-release
CentOS Linux release 7.0.1406 (Core)

Docker information:
[root@zjsm-slavexx ~]# docker version
Client:
Version: 17.04.0-ce
API version: 1.28
Go version: go1.7.5
Git commit: 4845c56
Built: Mon Apr 3 18:01:50 2017
OS/Arch: linux/amd64

Server:
Version: 17.04.0-ce
API version: 1.28 (minimum version 1.12)
Go version: go1.7.5
Git commit: 4845c56
Built: Mon Apr 3 18:01:50 2017
OS/Arch: linux/amd64
Experimental: false

CentOS Linux release 7.2.1511 kernel: 3.10.0-327.el7.x86_64
same problem

I've experimented this issue.

Ubuntu 16.04.3 LTS
Kernel 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Docker version:
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

@thaJeztah, perhaps you should add your comment to the top of the original post, as people are still ignoring it.

$ docker network ls
NETWORK ID          NAME                     DRIVER              SCOPE
b3fc47abfff2        bridge                   bridge              local
f9474559ede8        dockerfile_cluster_net   bridge              local
ef999de68a96        host                     host                local
e7b41d23674c        none                     null                local
$ docker network rm f9474559ede8 

fixed it.

@hzbd You mean delete the user-defined bridge network? Have your tried to dig further to find out why? Please let me know if you did that. I really appreciate that.

Waiting to be fixed

Are you guys running docker under any limits? Like ulimits, cgroups etc...

newer systemd has a default limit even if you didnt set it. I set things to unlimited and the issue hasn't occurred ever since (watching since 31 days).

Ok, this bug still occurs, but probability has reduced.

I think if the containers are gracefully stops (PID 1 exist()s), then this bug will not bother us.

@dElogics thanks for letting us know, could you please show us what commands you ran to set this systemd limits to unlimited. I like to try that too.

@dElogics thanks for letting us know, could you please show us what commands you ran to set this systemd limits to unlimited. I like to try that too.

You've to modify the systemd unit of docker. The systemd unit I use (only relevant parts) --

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service flannel.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker

ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

Did someone have this issue in a kernel 4.15 or newer?

This Dan Streetman fix (https://github.com/torvalds/linux/commit/4ee806d51176ba7b8ff1efd81f271d7252e03a1d) is first included in 4.15 kernel version and it seems that at least for someone it is not happening anymore since they upgraded to 4.16 (https://github.com/kubernetes/kubernetes/issues/64743#issuecomment-436839647)

Did someone try it out?

@victorgp We still experience the issue with the 4.15 kernel. We will report here when we have tested with 4.16 kernel (hopefully in a few weeks).

We used kernel version:4.14.62 for a few months ,this issue disappeared.

To add to my previous resolutions -- a gracefully stopping containers (which respond to SIGTERM) never triggers this.

Also try running the containers in the host namespace (if it's acceptable for you) which fully resolves the issue.

@dElogics What do you mean by "host namespace"? Is it simply --privileged?

@dElogics What do you mean by "host namespace"? Is it simply --privileged?

No, it means --network=host

Since upgrading from kernel 4.4.0 to 4.15.0 and docker 1.11.2 to 18.09 the issue disappeared.

In a sizeable fleet of VMs acting as docker hosts we had this issue appearing multiple times a day (with our Docker use-case).
45 days in and we are no longer seeing this.

For posterity, a stack-trace of a hung Docker 1.11.2 w/ printk's showing unregister_netdevice: waiting for vethXXXXX (similar to what we were always seeing in our fleet, in hundreds of VMs) can be found at http://paste.ubuntu.com/p/6RgkpX352J/ (the interesting Container ref is 0xc820001980)

goroutine 8809 [syscall, 542 minutes, locked to thread]:
syscall.Syscall6(0x2c, 0xd, 0xc822f3d200, 0x20, 0x0, 0xc822f3d1d4, 0xc, 0x20, 0xc82435fda0, 0x10)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.sendto(0xd, 0xc822f3d200, 0x20, 0x20, 0x0, 0xc822f3d1d4, 0xc80000000c, 0x0, 0x0)
    /usr/local/go/src/syscall/zsyscall_linux_amd64.go:1729 +0x8c
syscall.Sendto(0xd, 0xc822f3d200, 0x20, 0x20, 0x0, 0x7faba31bded8, 0xc822f3d1c8, 0x0, 0x0)
    /usr/local/go/src/syscall/syscall_unix.go:258 +0xaf
github.com/vishvananda/netlink/nl.(*NetlinkSocket).Send(0xc822f3d1c0, 0xc82435fda0, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/vishvananda/netlink/nl/nl_linux.go:333 +0xd4
github.com/vishvananda/netlink/nl.(*NetlinkRequest).Execute(0xc82435fda0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/vishvananda/netlink/nl/nl_linux.go:215 +0x111
github.com/vishvananda/netlink.LinkDel(0x7fab9c2b15d8, 0xc825ef2240, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/vishvananda/netlink/link_linux.go:615 +0x16b
github.com/docker/libnetwork/drivers/bridge.(*driver).DeleteEndpoint(0xc8204aac30, 0xc8203ae780, 0x40, 0xc826e7b800, 0x40, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/docker/libnetwork/drivers/bridge/bridge.go:1060 +0x5cf
github.com/docker/libnetwork.(*endpoint).deleteEndpoint(0xc822945b00, 0xc82001ac00, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/docker/libnetwork/endpoint.go:760 +0x261
github.com/docker/libnetwork.(*endpoint).Delete(0xc822945b00, 0x7fab9c2b0a00, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/docker/libnetwork/endpoint.go:735 +0xbcb
github.com/docker/libnetwork.(*sandbox).delete(0xc8226bc780, 0xc8229f0600, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/docker/libnetwork/sandbox.go:217 +0xd3f
github.com/docker/libnetwork.(*sandbox).Delete(0xc8226bc780, 0x0, 0x0)
    /usr/src/docker/vendor/src/github.com/docker/libnetwork/sandbox.go:175 +0x32
github.com/docker/docker/daemon.(*Daemon).releaseNetwork(0xc820001980, 0xc820e23a40)
    /usr/src/docker/.gopath/src/github.com/docker/docker/daemon/container_operations.go:732 +0x4f1
github.com/docker/docker/daemon.(*Daemon).Cleanup(0xc820001980, 0xc820e23a40)
    /usr/src/docker/.gopath/src/github.com/docker/docker/daemon/start.go:163 +0x62
github.com/docker/docker/daemon.(*Daemon).StateChanged(0xc820001980, 0xc825f9fac0, 0x40, 0xc824155b50, 0x4, 0x8900000000, 0x0, 0x0, 0x0, 0x0, ...)
    /usr/src/docker/.gopath/src/github.com/docker/docker/daemon/monitor.go:39 +0x60a
github.com/docker/docker/libcontainerd.(*container).handleEvent.func2()
    /usr/src/docker/.gopath/src/github.com/docker/docker/libcontainerd/container_linux.go:177 +0xa5
github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc820073c01, 0xc820f9a2a0, 0xc821f3de20, 0xc822ddf9e0)
    /usr/src/docker/.gopath/src/github.com/docker/docker/libcontainerd/queue_linux.go:26 +0x47
created by github.com/docker/docker/libcontainerd.(*queue).append
    /usr/src/docker/.gopath/src/github.com/docker/docker/libcontainerd/queue_linux.go:28 +0x1da

From that we can observe that it hanged in https://github.com/moby/moby/blob/v1.11.2/daemon/container_operations.go#L732

which points us to https://github.com/moby/moby/blob/v1.11.2/vendor/src/github.com/docker/libnetwork/sandbox.go#L175

And
https://github.com/moby/moby/blob/v1.11.2/vendor/src/github.com/docker/libnetwork/endpoint.go#L760

Which goes into libnetwork bridge driver (check the awesome description)
https://github.com/moby/moby/blob/v1.11.2/vendor/src/github.com/docker/libnetwork/drivers/bridge/bridge.go#L1057-L1061

Moving to netlink
https://github.com/moby/moby/blob/v1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go#L601-L617
https://github.com/moby/moby/blob/v1.11.2//vendor/src/github.com/vishvananda/netlink/nl/nl_linux.go#L215

And ultimately in that netlink socket, calls https://github.com/moby/moby/blob/v1.11.2/vendor/src/github.com/vishvananda/netlink/nl/nl_linux.go#L333

We feel that the bug in general happens when stopping a container and due to SKBs being still referenced in the netns the veth is not released, then Docker issues a Kill to that container after 15s. Docker daemon does not handle this situation gracefully, but ultimately the bug is in the kernel. We believe that https://github.com/torvalds/linux/commit/4ee806d51176ba7b8ff1efd81f271d7252e03a1d (accepted in 4.15 upstream) and commits linked to it (there are several) act as a mitigation.

In general, that part of the kernel is not a pretty place.

For what its worth...we upgraded RHEL Linux kernel from 3.10.0 to 4.17.11. (Running Kubernetes cluster on). Before upgrading this bug was occuring on daily basis several times on different servers. Running with the upgrade for three weeks now. Bug occurred only once. So roughly said is reduced by 99%.

@marckamerbeek You updated RHEL Kernel to a community Kernel? Then it is no longer supported.

@Beatlor CentOS user can do like this.

centos 7.2 still has this problem: kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

@Beatlor RHEL did not help us at all. A stable production environment is more important then some worthless support contract. We are still running very stable now on 4.17.11. No big issues anymore.

@Beatlor RHEL did not help us at all. A stable production environment is more important then some worthless support contract. We are still running very stable now on 4.17.11. No big issues anymore.

Yes, I also did not have this problem after upgrading the kernel to 4.17.0-1.el7.elrepo.x86_64. I tried this before (4.4.x, 4.8, 4.14..) and it has failed. It seems that the problem will not occur again in the 4.17+ kernel.

centos 7.2 still has this problem: kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

You can try to upgrade to the latest 4.19+ kernel.

Just wait for a few months and someone will come up complaining about the 4.19 kernel too. Just history repeating itself.

Hey everyone, good news !

Since my last comment here (at the time of writing, 17 days ago) I haven't got these errors again. My servers (about 30 of them) were running ubuntu 14.04 with some outdated packages.

After a full system upgrade including docker-engine (from 1.7.1 to 1.8.3) + kernel upgrade to the latest possible version on ubuntu's repo, my servers are running without any occurences.

🎱

which kernel version are you upgrade?

I challenge https://github.com/kubernetes/kubernetes/issues/70427#issuecomment-470681000 as we haven't been seeing this w/ thousands of VMs in 4.15.0 whilst we were seeing it dozens of times daily on 4.4.0, are there more reports of it in 4.15.0?

I'm seeing this issue with one of my machines running Docker on Debian 9 Stretch (4.9.0-8-amd64). I experience this issue with a tunnel created within the Docker container via Docker Gen and it generates a kernel panic:

Message from syslogd@xxxx at Apr 29 15:55:41 ...
 kernel:[719739.507961] unregister_netdevice: waiting for tf-xxxxxxxx to become free. Usage count = 1

Here's our Docker information:

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:34:04 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Does anybody know if there's a temporary fix to this without restarting the entire machine? We'd really prefer not having to restart the entire machine when we experience this issue.

Somewhat off-topic, we cannot suppress the kernel panic messages within the terminal as well. I've tried dmesg -D and dmesg -n 1. However, no luck. Is there a way to suppress these type of kernel panic messages from within the terminal? It's annoying trying to type commands and having that message pop up every 10 seconds or so.

Thanks.

Are these vanilla kernels or heavily patched by distros with backported fixes?

@pmoust I do see this on ubuntu 4.15.0-32 once a week or so. definitely much better since 4.4.0

@iavael i'll attempt to list distro info in the summary if the reference provides it.

Anyone saw this bug with 4.19?

Anyone saw this bug with 4.19?

https://github.com/kubernetes/kubernetes/issues/64743#issuecomment-451351435
https://github.com/kubernetes/kubernetes/issues/64743#issuecomment-461772385

This information may be helpful to you.

@tankywoo @drpancake @egasimus @csabahenk @spiffytech @ibuildthecloud @sbward @jbalonso @rsampaio @MrMMorris @rsampaio @unclejack @chrisjstevenson @popsikle @fxposter @scher200 @victorgp @jstangroome @Xuexiang825 @dElogics @Nowaker @pmoust @marckamerbeek @Beatlor @warmchang @Jovons @247687009 @jwongz @tao12345666333 @clkao Please look at this https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/

@tankywoo @drpancake @egasimus @csabahenk @spiffytech @ibuildthecloud @sbward @jbalonso @rsampaio @MrMMorris @rsampaio @unclejack @chrisjstevenson @popsikle @fxposter @scher200 @victorgp @jstangroome @Xuexiang825 @dElogics @Nowaker @pmoust @marckamerbeek @Beatlor @warmchang @Jovons @247687009 @jwongz @tao12345666333 @clkao Please look at this https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/

I followed the documentation, but I still get an error.

[root@node1 ~]# kpatch list
Loaded patch modules:
livepatch_route [enabled]

Installed patch modules:
[root@node1 ~]#
Message from syslogd@node1 at May  7 15:59:11 ...
 kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1

That message itself is not the bug; it's the kernel crashing afterwards; https://github.com/moby/moby/issues/5618#issuecomment-407751991

@tankywoo @drpancake @egasimus @csabahenk @spiffytech @ibuildthecloud @sbward @jbalonso @rsampaio @MrMMorris @rsampaio @unclejack @chrisjstevenson @popsikle @fxposter @scher200 @victorgp @jstangroome @Xuexiang825 @dElogics @Nowaker @pmoust @marckamerbeek @Beatlor @warmchang @Jovons @247687009 @jwongz @tao12345666333 @clkao Please look at this https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/

I followed the documentation, but I still get an error.

[root@node1 ~]# kpatch list
Loaded patch modules:
livepatch_route [enabled]

Installed patch modules:
[root@node1 ~]#
Message from syslogd@node1 at May  7 15:59:11 ...
 kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1

After rebooting, ok···

@vincent927 BTW,You should put livepatch_route.ko to /var/lib/kpatch/$(uname -r), when enable kpatch.service, the ko can be auto load after reboot.

We got this at our company suddenly today in several kubernetes clusters.

  • uname -a:
    Linux ip-10-47-17-58 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
  • docker version:

    Client:
     Version:           18.09.5
     API version:       1.39
     Go version:        go1.10.8
     Git commit:        e8ff056dbc
     Built:             Thu Apr 11 04:44:28 2019
     OS/Arch:           linux/amd64
     Experimental:      false
    
    Server: Docker Engine - Community
     Engine:
      Version:          18.09.2
      API version:      1.39 (minimum version 1.12)
      Go version:       go1.10.6
      Git commit:       6247962
      Built:            Sun Feb 10 03:42:13 2019
      OS/Arch:          linux/amd64
      Experimental:     false
    
  • kubectl version (server):
    Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}

We don't know the cause yet; we've been running these versions of the above software for several months with no issues. I am just commenting to add to the list of "versions of software that experience this bug" for now.

@ethercflow I've read that but since we run Debian at my company it's as not straightforward for us to implement the fix in that post.

@ethercflow @2rs2ts we are also running debian. I have encountered a lot of issues trying to get kpatch-build to work. If I manage to find a workaround I'll keep you posted. In any case, does anybody have any other solution? Is it kernel version 4.15 or 4.19 that mitigates the problem? I have been trying to find the answer for the past week and still have not managed to.

@commixon our experience is still the same as reported in https://github.com/moby/moby/issues/5618#issuecomment-455800975, across a fleet of thousand VMs no re-occurance of the issue w/ 4.15.0 on generic, AWS-optimised and GCP-optimised flavors of kernels provided by Canonical. Limited test on vanilla 4.15.0 did not show any of those issues either, but was not tested at scale.

Thanks a lot @pmoust . Will try them out. In any case I'll also try to patch kpatch to work with Debian (as a side project) and post updates here for anyone interested.

@ethercflow @2rs2ts we are also running debian. I have encountered a lot of issues trying to get kpatch-build to work. If I manage to find a workaround I'll keep you posted. In any case, does anybody have any other solution? Is it kernel version 4.15 or 4.19 that mitigates the problem? I have been trying to find the answer for the past week and still have not managed to.

You may upgrade to 4.19. It's in the backports.

BTW it's been a year for us here. ;)

We actually tried the 4.19 in the backports but it had some major regressions in other areas (the EC2 instances would just randomly reboot and then networking would be broken upon startup.) Guess we'll have to deal with this until the next stable.

@2rs2ts For the past 4 days we are using 4.19 from backports (in EC2) and we have not seen any problems at all. The kernel crash issue has not appeared at all and everything else seems fine as well. I don't believe it makes any difference but we based our Debian image to the one provided by kops (https://github.com/kubernetes/kops/blob/master/docs/images.md#debian). We updated the kernel in this image and not the stock debian.

Friends, I have been using the 4.19 kernel for stable operation for half a year. I hope that you can enjoy stability as well.

I have a container open 80 and 443 port , every 2 weeks ,from anther computer access container 80 and
443 is deny

centos7.3 kernel version is :
Linux browser1 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@browser1 ~]# docker version
Client:
Version: 18.06.3-ce
API version: 1.38
Go version: go1.10.4
Git commit: d7080c1
Built: Wed Feb 20 02:24:22 2019
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.06.3-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: d7080c1
Built: Wed Feb 20 02:25:33 2019
OS/Arch: linux/amd64
Experimental: false
[root@browser1 ~]#

dmesg:

[1063959.636785] unregister_netdevice: waiting for lo to become free. Usage count = 1
[1071340.887512] br-af29e1edc1b8: port 5(vethc2ac4f8) entered disabled state
[1071340.891753] br-af29e1edc1b8: port 5(vethc2ac4f8) entered disabled state
[1071340.895118] device vethc2ac4f8 left promiscuous mode
[1071340.895138] br-af29e1edc1b8: port 5(vethc2ac4f8) entered disabled state
[1071340.990505] device veth5e4f161 entered promiscuous mode
[1071340.990897] IPv6: ADDRCONF(NETDEV_UP): veth5e4f161: link is not ready
[1071340.990904] br-af29e1edc1b8: port 5(veth5e4f161) entered forwarding state
[1071340.990924] br-af29e1edc1b8: port 5(veth5e4f161) entered forwarding state
[1071341.231405] IPv6: ADDRCONF(NETDEV_CHANGE): veth5e4f161: link becomes ready
[1071355.991701] br-af29e1edc1b8: port 5(veth5e4f161) entered forwarding state
[1071551.533907] br-af29e1edc1b8: port 5(veth5e4f161) entered disabled state
[1071551.537564] br-af29e1edc1b8: port 5(veth5e4f161) entered disabled state
[1071551.540295] device veth5e4f161 left promiscuous mode
[1071551.540313] br-af29e1edc1b8: port 5(veth5e4f161) entered disabled state
[1071551.570924] device veth8fd3a0a entered promiscuous mode
[1071551.571550] IPv6: ADDRCONF(NETDEV_UP): veth8fd3a0a: link is not ready
[1071551.571556] br-af29e1edc1b8: port 5(veth8fd3a0a) entered forwarding state
[1071551.571582] br-af29e1edc1b8: port 5(veth8fd3a0a) entered forwarding state
[1071551.841656] IPv6: ADDRCONF(NETDEV_CHANGE): veth8fd3a0a: link becomes ready
[1071566.613998] br-af29e1edc1b8: port 5(veth8fd3a0a) entered forwarding state
[1071923.465082] br-af29e1edc1b8: port 5(veth8fd3a0a) entered disabled state
[1071923.470215] br-af29e1edc1b8: port 5(veth8fd3a0a) entered disabled state
[1071923.472888] device veth8fd3a0a left promiscuous mode
[1071923.472904] br-af29e1edc1b8: port 5(veth8fd3a0a) entered disabled state
[1071923.505580] device veth9e693ae entered promiscuous mode
[1071923.505919] IPv6: ADDRCONF(NETDEV_UP): veth9e693ae: link is not ready
[1071923.505925] br-af29e1edc1b8: port 5(veth9e693ae) entered forwarding state
[1071923.505944] br-af29e1edc1b8: port 5(veth9e693ae) entered forwarding state
[1071923.781658] IPv6: ADDRCONF(NETDEV_CHANGE): veth9e693ae: link becomes ready
[1071938.515044] br-af29e1edc1b8: port 5(veth9e693ae) entered forwarding state

Anyone saw this bug with 4.19?

Yes. I have the issue on kernel 4.19.4-1.e17.elrep.x86_64

Hello,

I am also seeing this error. Do we have any solution for this issue? Kernel 3.10.0-514.26.2.el7.x86_64

[username@ip-10-1-4-64 ~]$
Message from syslogd@ip-10-1-4-64 at Jul 19 10:50:01 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Message from syslogd@ip-10-1-4-64 at Jul 19 10:50:48 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This issue still happening :( no update/ideas on how to fix?

Happening on Debian Stretch. I was trying to update my Jenkins container via Ansible when this happened.

This issue has been solved by this commit :
https://github.com/torvalds/linux/commit/ee60ad219f5c7c4fb2f047f88037770063ef785f
Using kpatch

curl -SOL https://raw.githubusercontent.com/Aleishus/kdt/master/kpatchs/route.patch
kpatch-build -t vmlinux route.patch 
mkdir -p /var/lib/kpatch/${UNAME} 
cp -a livepatch-route.ko /var/lib/kpatch/${UNAME}
systemctl restart kpatch
kpatch list

This issue has been solved by this commit :
torvalds/linux@ee60ad2
Using kpatch

curl -SOL https://raw.githubusercontent.com/Aleishus/kdt/master/kpatchs/route.patch
kpatch-build -t vmlinux route.patch 
mkdir -p /var/lib/kpatch/${UNAME} 
cp -a livepatch-route.ko /var/lib/kpatch/${UNAME}
systemctl restart kpatch
kpatch list

This must be in 4.19.30 onwards.

I am not sure torvalds/linux@ee60ad2 is the definitive fix for it - we've seen this in 4.4.0 AFAR, whereas https://github.com/torvalds/linux/commit/deed49df7390d5239024199e249190328f1651e7 was only added in 4.5.0

We've reproduced the same bug using a diagnostic kernel that had delays artificially inserted to make PMTU discovery exception routes hit this window.

  1. Debugging kernel patch:
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a0163c5..6b9e7ee 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -133,6 +133,8 @@

 static int ip_min_valid_pmtu __read_mostly = IPV4_MIN_MTU;

+static int ref_leak_test;
+
/*
  * Interface to generic destination cache.
  */
@@ -1599,6 +1601,9 @@ static void ip_del_fnhe(struct fib_nh *nh, __be32 daddr)
    fnhe = rcu_dereference_protected(*fnhe_p, lockdep_is_held(&fnhe_lock));
    while (fnhe) {
        if (fnhe->fnhe_daddr == daddr) {
+           if (ref_leak_test)
+               pr_info("XXX pid: %d, %s: fib_nh:%p, fnhe:%p, daddr:%x\n",
+                   current->pid,  __func__, nh, fnhe, daddr);
            rcu_assign_pointer(*fnhe_p, rcu_dereference_protected(
                fnhe->fnhe_next, lockdep_is_held(&fnhe_lock)));
            fnhe_flush_routes(fnhe);
@@ -2145,10 +2150,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,

        fnhe = find_exception(nh, fl4->daddr);
        if (fnhe) {
+           if (ref_leak_test)
+               pr_info("XXX pid: %d, found fnhe :%p\n", current->pid, fnhe);
            prth = &fnhe->fnhe_rth_output;
            rth = rcu_dereference(*prth);
            if (rth && rth->dst.expires &&
`               time_after(jiffies, rth->dst.expires)) {
+               if (ref_leak_test)
+                   pr_info("eXX pid: %d, del fnhe :%p\n", current->pid, fnhe);
                ip_del_fnhe(nh, fl4->daddr);
                fnhe = NULL;
            } else {
@@ -2204,6 +2213,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 #endif
    }

+   if (fnhe && ref_leak_test) {
+       unsigned long  time_out;
+
+       time_out = jiffies + ref_leak_test;
+       while (time_before(jiffies, time_out))
+           cpu_relax();
+       pr_info("XXX pid: %d, reuse fnhe :%p\n", current->pid, fnhe);
+   }
    rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0);
    if (lwtunnel_output_redirect(rth->dst.lwtstate))
        rth->dst.output = lwtunnel_output;
@@ -2733,6 +2750,13 @@ static int ipv4_sysctl_rtcache_flush(struct ctl_table *__ctl, int write,
        .proc_handler   = proc_dointvec,
    },
    {
+       .procname   = "ref_leak_test",
+       .data       = &ref_leak_test,
+       .maxlen     = sizeof(int),
+       .mode       = 0644,
+       .proc_handler   = proc_dointvec,
+   },
+   {
        .procname   = "max_size",
        .data       = &ip_rt_max_size,
        .maxlen     = sizeof(int),
  1. User mode script:

ref_leak_test_begin.sh:

#!/bin/bash

# constructing a basic network with netns
# client <-->gateway <--> server
ip netns add svr
ip netns add gw
ip netns add cli

ip netns exec gw sysctl net.ipv4.ip_forward=1

ip link add svr-veth type veth peer name svrgw-veth
ip link add cli-veth type veth peer name cligw-veth

ip link set svr-veth netns svr
ip link set svrgw-veth netns gw
ip link set cligw-veth netns gw
ip link set cli-veth netns cli

ip netns exec svr ifconfig svr-veth 192.168.123.1
ip netns exec gw ifconfig svrgw-veth 192.168.123.254
ip netns exec gw ifconfig cligw-veth 10.0.123.254
ip netns exec cli ifconfig cli-veth 10.0.123.1

ip netns exec cli route add default gw 10.0.123.254
ip netns exec svr route add default gw 192.168.123.254

# constructing concurrently accessed scenes with nerperf
nohup ip netns exec svr  netserver -L 192.168.123.1

nohup ip netns exec cli  netperf -H 192.168.123.1 -l 300 &
nohup ip netns exec cli  netperf -H 192.168.123.1 -l 300 &
nohup ip netns exec cli  netperf -H 192.168.123.1 -l 300 &
nohup ip netns exec cli  netperf -H 192.168.123.1 -l 300 &

# Add delay
echo 3000 > /proc/sys/net/ipv4/route/ref_leak_test

# making PMTU discovery exception routes
echo 1 >  /proc/sys/net/ipv4/route/mtu_expires
for((i=1;i<=60;i++));
do
  for j in 1400  1300 1100 1000
  do
    echo "set mtu to "$j;
    ip netns exec svr ifconfig  svr-veth  mtu $j;
    ip netns exec cli ifconfig  cli-veth  mtu $j;
    ip netns exec gw ifconfig svrgw-veth  mtu $j;
    ip netns exec gw ifconfig cligw-veth  mtu $j;
    sleep 2;
  done
done

ref_leak_test_end.sh:

#!/bin/bash

echo 0 > /proc/sys/net/ipv4/route/ref_leak_test

pkill netserver
pkill netperf

ip netns exec cli ifconfig cli-veth down
ip netns exec gw ifconfig svrgw-veth down
ip netns exec gw ifconfig cligw-veth down
ip netns exec svr ifconfig svr-veth down

ip netns del svr
ip netns del gw
ip netns del cli

The test process:

  • first load the debug kernel,
  • then run ref_leak_test_begin.sh,
  • wait a few seconds, run ref_leak_test_end.sh,
  • and finally you can observe the error.
[root@iZuf6h1kfgutxc3el68z2lZ test]# bash ref_leak_test_begin.sh
net.ipv4.ip_forward = 1
nohup: ignoring input and appending output to ‘nohup.out’
nohup: set mtu to 1400
appending output to ‘nohup.out’
nohup: appending output to ‘nohup.out’
nohup: appending output to ‘nohup.out’
nohup: appending output to ‘nohup.out’
set mtu to 1300
set mtu to 1100
set mtu to 1000
set mtu to 1400
set mtu to 1300
set mtu to 1100
^C
[root@iZuf6h1kfgutxc3el68z2lZ test]# bash ref_leak_test_end.sh
[root@iZuf6h1kfgutxc3el68z2lZ test]#
Message from syslogd@iZuf6h1kfgutxc3el68z2lZ at Nov  4 20:29:43 ...
 kernel:unregister_netdevice: waiting for cli-veth to become free. Usage count = 1

After some testing, torvalds/linux@ee60ad2 can indeed fix this bug.

Anyone saw this bug with 4.19?

same

Yes, on Debian! Is there any way to suppress it?

Found out my Docker logs are also being spammed. Kernel 5.4.0, Docker 19.03.8:

Mar 21 18:46:14 host.mysite.com dockerd[16544]: time="2020-03-21T18:46:14.127275161Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:45:13 host.mysite.com dockerd[16544]: time="2020-03-21T18:45:13.642050333Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:44:13 host.mysite.com dockerd[16544]: time="2020-03-21T18:44:13.161364216Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Mar 21 18:43:12 host.mysite.com dockerd[16544]: time="2020-03-21T18:43:12.714725302Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

I finally found out how to suppress these messages btw. From this question on StackExchange, I commented out this line in /etc/rsyslog.conf:

# Everybody gets emergency messages
#*.emerg                    :omusrmsg:*

Very nuclear option, but at least now my system is usable again!

@steelcowboy You can configure rsyslog to only void those annoying messages instead of all emergencies which is more desirable.

I wrote the following into /etc/rsyslog.d/40-unreigster-netdevice.conf and restarted rsyslog systemctl restart rsyslog.

# match frequent not relevant emergency messages generated by Docker when transfering large amounts of data through the network
:msg,contains,"unregister_netdevice: waiting for lo to become free. Usage count = 1" /dev/null

# discard matching messages
& stop

Any news here?

Was this page helpful?
0 / 5 - 0 ratings