Having same strange issues with Pi3B+ [1] [2]

IMO it's network driver (lan78xx) related or the usb host driver does not handle usb transfers correctly.

After switching from eth to wlan, everything is ok

[1] Was never able to make an 3GB image backup (root fs is on external usb disk) over network (kernel 4.9, 4.14 and 4.15 tested). Same configuration works on an older Pi3B without any issues

[2] When root fs is on iSCSI target, got massive kernel OOPS when using kernel 4.14.27+. After switching to 4.9.87+, ethernet connection was better (no kernel OOPS), but maybe still not stable (more testing needed). Kernel 4.15.10+ not tested

mkreisl on 18 Mar 2018

👍2

I did another test.

Now I am back with the official kernel of the raspbian stretch : 4.9.80-v7, and it is working up to now.
I transfered 10Go from an USB HDD with a rate of 20 Mo/s.

toto8551 on 19 Mar 2018

I've got the same problem here... Trying to transfer large files over ethernet fails after a minute or so... Fine on Wifi... However, when I try to downgrade to 4.9.80-v7, I get four flashing green lights on the PI3B+... I can take the card out and plug it into a Pi2 and it boots up just fine... I then have to re-update the firmware and swap everything back to the Pi3B+... Frustrating...

pjgpetecodes on 20 Mar 2018

👍1

Is this bug related to #2446? If so, then the forthcoming kernel 4.14.28 should fix it. Reverting to an old kernel may be a temporary solution, but only for advanced users.

Are other distros affected: xbian, osmc, LibreELEC?

At least we know that OSMC used pre-production units, so may be they included a fix/workaround to this problem.

I have 10 days to decide to return the RPi 3B+, so I want to start with a working Ethernet or just return the unit unopened.

fieryo on 21 Mar 2018

👍1

Is this bug related to #2446? If so, then the forthcoming kernel 4.14.28 should fix it. Reverting to an old kernel may be a temporary solution, but only for advanced users.

IMO there is a good chance to solve some strange issues.

As already reported, I'm struggling with my Pi3B+ since I got that part. Raspbian seems to work in my testing scenario [1], but XBian was crashing always, sooner or later. The major difference between Raspbian and XBian is, that Rasbian is using ipv6, whereas XBian is using ipv4 (ipv6 is disabled per default), and those patches fixing the ipv4 stack. After enabling tcp6 on XBian, my test seems so finish successfully.

[1] Root fs on usb disk (60GB), mounting Samba share on my server to /mnt, and then run dd if=/dev/sda of=/mnt/test.img bs=1M status=process. Raspbian succeeded, XBian crashes always in different ways: kernel oops, kernel panic of just completely freezed.

mkreisl on 21 Mar 2018

If you have a Linux target machine, can you try running this one-liner (you'll need to change the login credentials for the target):

dd if=/dev/zero bs=1M status=progress | ssh user@target "cat >/dev/null"

I've had that running overnight and it's approaching 2TB transferred without issue. The channel is encrypted, so the 29.4MB/s (235Mb/s) it's achieving isn't bad.

If that's solid, switch to:

while true; do date 1>&2; sudo dd if=/dev/mmcblk0 bs=1M status=progress; done | ssh user@target "cat >/dev/null"

Then:

while true; do date 1>&2; sudo dd if=/dev/sda bs=1M status=progress; done | ssh user@target "cat >/dev/null"

pelwell on 22 Mar 2018

@pelwell
Sure, there are many many ways to run a test. Unfortunately, after my tests has been finished successfully, my real application making an image backup started from Kodi GUI still kills the Pi3B+ completely [1], in different ways (4.14.27+ mostly throws kernel Oops #5, so built kernel with netdev patch, Oops were gone, but now still getting VFS CIFS stuck for more than 15s message, and Kernel 4.14.29+ build tonight did not make any difference).

Conclusion:
Pi3B+ is at the moment absolutely unusable, crappiest Pi ever had. Extremely frustrating

[1] That backup is using btrfs send/receive, sends all local btrfs subvolumes (located on external USB disk) to an image mounted on a network share (cifs of nfs).

mkreisl on 22 Mar 2018

👍2

Pi3B+ is at the moment absolutely unusable, crappiest Pi ever had. Extremely frustrating

For you, apparently, but not for me - that's why I'm trying to be methodical and work out which step is causing the problem.

pelwell on 22 Mar 2018

@mkreisl Out of curiosity, could you possibly try using sshfs instead of CIFS or NFS? It's a FUSE module instead of a native filesystem driver, and it uses a completely different (and much more efficient) protocol on-the-wire, so it may work even though CIFS and NFS aren't. sudo apt-get install sshfs should get it for you (I'm about 99% certain Raspbian has it). If that works, it should at least give you something you can use in the short term until this gets figured out.

Ferroin on 22 Mar 2018

For you, apparently, but not for me - that's why I'm trying to be methodical and work out which step is causing the problem.

Those basic tests you are running does cause the issue. IMO many many data in and out of the USB is causing the problem. That's the main difference between Pi3B nd Pi3B+, Pi3B has one usb hub, and Pi3B+ has two:

Pi3B:

Bus 001 Device 004: ID 046d:c503 Logitech, Inc. Cordless Mouse+Keyboard Receiver
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Pi3B+:

Bus 001 Device 004: ID 046d:c503 Logitech, Inc. Cordless Mouse+Keyboard Receiver
Bus 001 Device 005: ID 0424:7800 Standard Microsystems Corp. 
Bus 001 Device 003: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 002: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

mkreisl on 22 Mar 2018

@Ferroin Good idea, I'll test it. Takes some time, have to modify the backup script

mkreisl on 22 Mar 2018

@Ferroin sshfs doesn't make a difference, Pi3B+ dying with kernel Oops again (Pi3B works like a charm)

Mar 22 17:40:00 kmxbilr2 kernel: [  555.453472] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
Mar 22 17:40:00 kmxbilr2 kernel: [  555.453480] pgd = a12d8000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.453484] [0000000d] *pgd=00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.453493] Internal error: Oops: 5 [#1] SMP ARM
Mar 22 17:40:00 kmxbilr2 kernel: [  555.453578] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454081] CPU: 1 PID: 10626 Comm: sudo Tainted: G         C      4.14.29+ #1
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454170] Hardware name: BCM2835
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454216] task: 91ab1e00 task.stack: 93ea0000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454287] PC is at locks_remove_posix+0x30/0x14c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454364] LR is at filp_close+0x68/0x8c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454447] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 20000013
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454550] sp : 93ea1eb0  ip : 93ea1f58  fp : 93ea1f54
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454616] r10: 00000000  r9 : 93ea0000  r8 : 80108204
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454680] r7 : 00000006  r6 : acc63900  r5 : 9d640180  r4 : ae690440
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454754] r3 : 00000001  r2 : ad1bb940  r1 : acc63900  r0 : 9d640180
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454831] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454914] Control: 10c5383d  Table: 212d806a  DAC: 00000055
Mar 22 17:40:00 kmxbilr2 kernel: [  555.454983] Process sudo (pid: 10626, stack limit = 0x93ea0210)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455058] Stack: (0x93ea1eb0 to 0x93ea2000)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455112] 1ea0:                                     802fe824 91ab1e00 00000044 802a8f5c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455210] 1ec0: 00000004 00000000 00000000 00000100 81f22e80 80d093bc 00000017 808ac294
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455316] 1ee0: 76ebfe2c 93ea1fb0 7ed36760 808a8780 93ea1f14 93ea1f00 808a8780 806fbae8
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455440] 1f00: add51c00 add51d74 93ea1f34 93ea1f18 806fbae8 808a7758 add52400 7f0220bc
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455561] 1f20: add52400 add52444 93ea1f54 00000000 9d640180 acc63900 00000006 80108204
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455679] 1f40: 93ea0000 00000000 93ea1f74 93ea1f58 80286ac4 802e088c 0000003c acc63900
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455787] 1f60: 9d640180 00000006 93ea1f94 93ea1f78 802aad38 80286a68 76fbd218 0000000b
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455892] 1f80: 00000000 00000006 93ea1fa4 93ea1f98 80286b18 802aac7c 00000000 93ea1fa8
Mar 22 17:40:00 kmxbilr2 kernel: [  555.455993] 1fa0: 80108060 80286af4 76fbd218 0000000b 0000003c 7ed36760 76fbcb5c 76fbcb5c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456102] 1fc0: 76fbd218 0000000b 00000000 00000006 01d185b8 7ed36760 76fface8 0000003c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456205] 1fe0: 7ed36738 7ed36728 76fbcb6c 76e94fc2 60000030 0000003c 00000000 00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456325] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456435] [<80286ac4>] (filp_close) from [<802aad38>] (__close_fd+0xc8/0xec)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456529] [<802aad38>] (__close_fd) from [<80286b18>] (SyS_close+0x30/0x58)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456626] [<80286b18>] (SyS_close) from [<80108060>] (ret_fast_syscall+0x0/0x28)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456728] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
Mar 22 17:40:00 kmxbilr2 kernel: [  555.456857] ---[ end trace 124dc21b94998788 ]---
Mar 22 17:40:00 kmxbilr2 kernel: [  555.477670] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
Mar 22 17:40:00 kmxbilr2 kernel: [  555.477836] pgd = 80004000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.477896] [0000000d] *pgd=00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.477964] Internal error: Oops: 5 [#2] SMP ARM
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478032] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478580] CPU: 3 PID: 10620 Comm: sudo Tainted: G      D  C      4.14.29+ #1
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478678] Hardware name: BCM2835
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478725] task: 82818f00 task.stack: 86ea8000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478800] PC is at locks_remove_posix+0x30/0x14c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478867] LR is at filp_close+0x68/0x8c
Mar 22 17:40:00 kmxbilr2 kernel: [  555.478933] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 20040013
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479016] sp : 86ea9d08  ip : 86ea9db0  fp : 86ea9dac
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479084] r10: 0000000b  r9 : ad41a980  r8 : 00000004
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479152] r7 : ade91500  r6 : ade91500  r5 : 9d640180  r4 : ae690440
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479234] r3 : 00000001  r2 : 9d640180  r1 : ade91500  r0 : 9d640180
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479316] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479407] Control: 10c5383d  Table: 109d806a  DAC: 00000055
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479486] Process sudo (pid: 10620, stack limit = 0x86ea8210)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479561] Stack: (0x86ea9d08 to 0x86eaa000)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479621] 9d00:                   80d87980 809024bc 2e486000 808a6840 af11ad40 86ea9d48
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479728] 9d20: 8c3da480 8025bc28 ae811a40 80e15480 60040013 60040013 0000008c 00033c4b
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479832] 9d40: 00000000 808a8780 86ea9d6c 86ea9d58 808a8780 806fbae8 add51c00 add51d74
Mar 22 17:40:00 kmxbilr2 kernel: [  555.479956] 9d60: 86ea9d8c 86ea9d70 806fbae8 808a7758 add52400 7f0220bc add52400 add52444
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480075] 9d80: 86ea9dac 00000000 9d640180 ade91500 ade91500 00000004 ad41a980 0000000b
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480184] 9da0: 86ea9dcc 86ea9db0 80286ac4 802e088c 00000009 000000f0 00000000 ade91500
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480287] 9dc0: 86ea9df4 86ea9dd0 802aa864 80286a68 82818f00 82819440 ade91500 ae811a40
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480389] 9de0: 00000001 ae811a78 86ea9e14 86ea9df8 802aa974 802aa7bc 82818f00 00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480493] 9e00: 00000544 ae811a40 86ea9e54 86ea9e18 80121d6c 802aa928 86ea9e74 86ea9e28
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480595] 9e20: 86ea9edc 86ea9fb0 00000000 0000000b ac9a02c0 86ea9edc 86ea8000 00106001
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480698] 9e40: 86ea8000 0000000b 86ea9e74 86ea9e58 8012260c 801219e0 00000000 ad6aba88
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480800] 9e60: 86ea9edc 86ea8000 86ea9ec4 86ea9e78 8012d9f4 801225cc 80d02040 418004fc
Mar 22 17:40:00 kmxbilr2 kernel: [  555.480903] 9e80: 80d03d68 86ea9ec8 ac9a02c0 ad6abec4 ad6ab9c0 000000a0 0000000b 76e27574
Mar 22 17:40:00 kmxbilr2 kernel: [  555.494505] 9ea0: 86ea9ec8 86ea9fb0 76e27576 00000000 86ea8000 00000000 86ea9f8c 86ea9ec8
Mar 22 17:40:00 kmxbilr2 kernel: [  555.506477] 9ec0: 8010b318 8012d6c8 86ea9ef4 86ea9ed8 8012d084 8012cf08 00000000 0000000b
Mar 22 17:40:00 kmxbilr2 kernel: [  555.518136] 9ee0: 00000000 00000000 0000297c 00000000 ad6ab9e8 00000001 82818f00 00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.531395] 9f00: 00000000 0000297c 00000000 ad6ab9e8 00000001 82818f00 00000001 86ea9f68
Mar 22 17:40:00 kmxbilr2 kernel: [  555.545698] 9f20: 86ea9f64 86ea9f30 8012f17c 801edc50 82819444 00000002 7ed36aa8 00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.558183] 9f40: 00000002 7ed36aa8 000000ae 80108204 86ea8000 00000000 86ea9fa4 86ea9f68
Mar 22 17:40:00 kmxbilr2 kernel: [  555.570528] 9f60: 8012f6f0 00000001 86ea8010 80108204 86ea9fb0 80108204 86ea8000 00000000
Mar 22 17:40:00 kmxbilr2 kernel: [  555.582517] 9f80: 86ea9fac 86ea9f90 8010b820 8010b260 7ed36aa8 0000000b 0000000b 00000025
Mar 22 17:40:00 kmxbilr2 kernel: [  555.594335] 9fa0: 00000000 86ea9fb0 80108094 8010b774 00000000 0000000b 3a4d6900 3a4d6900
Mar 22 17:40:00 kmxbilr2 kernel: [  555.607730] 9fc0: 7ed36aa8 0000000b 0000000b 00000025 01d15c58 0049016c 00000000 7ed36a18
Mar 22 17:40:00 kmxbilr2 kernel: [  555.621658] 9fe0: 004a1dd8 7ed369c4 0047dd5b 76e27576 20040030 0000297c a302000d 64690063
Mar 22 17:40:00 kmxbilr2 kernel: [  555.635304] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.647889] [<80286ac4>] (filp_close) from [<802aa864>] (put_files_struct+0xb4/0x10c)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.660265] [<802aa864>] (put_files_struct) from [<802aa974>] (exit_files+0x58/0x5c)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.672651] [<802aa974>] (exit_files) from [<80121d6c>] (do_exit+0x398/0xba0)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.684677] [<80121d6c>] (do_exit) from [<8012260c>] (do_group_exit+0x4c/0xe4)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.696657] [<8012260c>] (do_group_exit) from [<8012d9f4>] (get_signal+0x338/0x6e0)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.708835] [<8012d9f4>] (get_signal) from [<8010b318>] (do_signal+0xc4/0x3e4)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.720791] [<8010b318>] (do_signal) from [<8010b820>] (do_work_pending+0xb8/0xd0)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.732744] [<8010b820>] (do_work_pending) from [<80108094>] (slow_work_pending+0xc/0x20)
Mar 22 17:40:00 kmxbilr2 kernel: [  555.744414] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
Mar 22 17:40:00 kmxbilr2 kernel: [  555.755952] ---[ end trace 124dc21b94998789 ]---
Mar 22 17:40:00 kmxbilr2 kernel: [  555.772247] Fixing recursive fault but reboot is needed!
Mar 22 17:40:02 kmxbilr2 kernel: [  557.572209] Unable to handle kernel paging request at virtual address 04bc7ffc
Mar 22 17:40:02 kmxbilr2 kernel: [  557.583366] pgd = a0380000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.594624] [04bc7ffc] *pgd=00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.605878] Internal error: Oops: 5 [#3] SMP ARM
Mar 22 17:40:02 kmxbilr2 kernel: [  557.616992] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
Mar 22 17:40:02 kmxbilr2 kernel: [  557.639993] CPU: 3 PID: 10496 Comm: btrfs Tainted: G      D  C      4.14.29+ #1
Mar 22 17:40:02 kmxbilr2 kernel: [  557.651518] Hardware name: BCM2835
Mar 22 17:40:02 kmxbilr2 kernel: [  557.662917] task: 9851bc00 task.stack: a01e4000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.674480] PC is at __d_lookup_rcu+0x68/0x19c
Mar 22 17:40:02 kmxbilr2 kernel: [  557.685867] LR is at lookup_fast+0x4c/0x2c8
Mar 22 17:40:02 kmxbilr2 kernel: [  557.697094] pc : [<802a4da0>]    lr : [<80295cc4>]    psr: 20010013
Mar 22 17:40:02 kmxbilr2 kernel: [  557.708335] sp : a01e5d20  ip : 80d04590  fp : a01e5d5c
Mar 22 17:40:02 kmxbilr2 kernel: [  557.719619] r10: a01e5e50  r9 : 00000006  r8 : e6ad3f44
Mar 22 17:40:02 kmxbilr2 kernel: [  557.730983] r7 : a03d8550  r6 : a03d8550  r5 : 00000000  r4 : 04bc8000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.741942] r3 : 0001cd5a  r2 : 00004000  r1 : 00000000  r0 : a03d8550
Mar 22 17:40:02 kmxbilr2 kernel: [  557.752961] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Mar 22 17:40:02 kmxbilr2 kernel: [  557.763554] Control: 10c5383d  Table: 2038006a  DAC: 00000055
Mar 22 17:40:02 kmxbilr2 kernel: [  557.774189] Process btrfs (pid: 10496, stack limit = 0xa01e4210)
Mar 22 17:40:02 kmxbilr2 kernel: [  557.784760] Stack: (0xa01e5d20 to 0xa01e6000)
Mar 22 17:40:02 kmxbilr2 kernel: [  557.795359] 5d20: 8c6f2000 a01e5d6c 00000006 8c6f0038 a01e5d5c a01e5e48 00000000 a01e5da8
Mar 22 17:40:02 kmxbilr2 kernel: [  557.806204] 5d40: a03d8550 a01e5da0 ad639b10 a01e5da4 a01e5d9c a01e5d60 80295cc4 802a4d44
Mar 22 17:40:02 kmxbilr2 kernel: [  557.817041] 5d60: 8c6f2000 ae4085a0 00001455 00000006 00001542 a01e5e48 00000000 00000003
Mar 22 17:40:02 kmxbilr2 kernel: [  557.828472] 5d80: 8c6f003f a01e5e48 8cff9c59 a03d8550 a01e5dd4 a01e5da0 80298140 80295c84
Mar 22 17:40:02 kmxbilr2 kernel: [  557.843155] 5da0: a01e5dc4 a01e5db0 8029683c 8042ea20 c2d0f82e 47090a62 e6ad3f44 8c6f003f
Mar 22 17:40:02 kmxbilr2 kernel: [  557.855354] 5dc0: a01e5e48 8cff9c59 a01e5e24 a01e5dd8 8029856c 80298110 80295448 61c88647
Mar 22 17:40:02 kmxbilr2 kernel: [  557.866613] 5de0: 00000000 00000006 a01e5e48 8c6f0010 a01e5e24 a01e5e00 80294f74 8c6f0010
Mar 22 17:40:02 kmxbilr2 kernel: [  557.878046] 5e00: a01e5e48 a01e5f38 a01e5f38 a01e5f40 00000000 ffffff9c a01e5e44 a01e5e28
Mar 22 17:40:02 kmxbilr2 kernel: [  557.889410] 5e20: 802988c4 802983ec 8c6f0000 a01e5e48 00000000 a01e5f38 a01e5eec a01e5e48
Mar 22 17:40:02 kmxbilr2 kernel: [  557.900746] 5e40: 8029a4b4 8029889c ad639b10 a03d8550 e6ad3f44 00000006 8c6f0038 80276ee4
Mar 22 17:40:02 kmxbilr2 kernel: [  557.912237] 5e60: ae990c10 ae7af660 883d6660 00000050 00000006 000003a4 00000000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.923647] 5e80: 00000000 a01e5e88 80d0459c 8c6f0000 80d0459c 7ea7a81c 00000000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.934994] 5ea0: a01e5edc a01e5eb0 8029a30c 805d1904 a01e5e88 8c6f4000 8c6f0000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.946359] 5ec0: 00000026 00000002 ffffff9c ffffff9c 8c6f4000 7ea7981c a01e5f50 00000026
Mar 22 17:40:02 kmxbilr2 kernel: [  557.957715] 5ee0: a01e5f8c a01e5ef0 8029b984 8029a444 a01e5f50 a01e5f28 98785c08 a01e5ef0
Mar 22 17:40:02 kmxbilr2 kernel: [  557.969150] 5f00: 00000001 802ad51c 98785c00 00000800 00000000 00000000 7ea7a81c ffffff9c
Mar 22 17:40:02 kmxbilr2 kernel: [  557.980558] 5f20: 00000000 00000000 98785c08 00000000 ad639b10 962fd330 a01e4000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  557.992053] 5f40: aa64a66c 0000000a 8c6f402a 80276ee4 00000000 00000000 a01e5f7c 98785c00
Mar 22 17:40:02 kmxbilr2 kernel: [  558.003698] 5f60: 98785c00 00000000 00e9a010 00e9a050 00000026 80108204 a01e4000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  558.015188] 5f80: a01e5fa4 a01e5f90 8029bd9c 8029b8bc 00000000 00e9a050 00000000 a01e5fa8
Mar 22 17:40:02 kmxbilr2 kernel: [  558.026723] 5fa0: 80108060 8029bd74 00000000 00e9a010 7ea7981c 7ea7a81c ffffffff 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  558.038198] 5fc0: 00000000 00e9a010 00e9a050 00000026 004fd000 76f05ce8 7ea7981c 7ea7a81c
Mar 22 17:40:02 kmxbilr2 kernel: [  558.049811] 5fe0: 004fd198 7ea79814 0049cc9b 76cd7796 00010030 7ea7981c 00000000 00000000
Mar 22 17:40:02 kmxbilr2 kernel: [  558.061386] [<802a4da0>] (__d_lookup_rcu) from [<80295cc4>] (lookup_fast+0x4c/0x2c8)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.072944] [<80295cc4>] (lookup_fast) from [<80298140>] (walk_component+0x3c/0x2dc)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.084330] [<80298140>] (walk_component) from [<8029856c>] (link_path_walk+0x18c/0x4b0)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.095582] [<8029856c>] (link_path_walk) from [<802988c4>] (path_parentat+0x34/0x6c)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.106851] [<802988c4>] (path_parentat) from [<8029a4b4>] (filename_parentat+0x7c/0x104)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.117923] [<8029a4b4>] (filename_parentat) from [<8029b984>] (SyS_renameat2+0xd4/0x48c)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.129156] [<8029b984>] (SyS_renameat2) from [<8029bd9c>] (SyS_rename+0x34/0x3c)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.140092] [<8029bd9c>] (SyS_rename) from [<80108060>] (ret_fast_syscall+0x0/0x28)
Mar 22 17:40:02 kmxbilr2 kernel: [  558.150966] Code: ea000002 e5944000 e3540000 0a000028 (e5141004) 
Mar 22 17:40:02 kmxbilr2 kernel: [  558.161756] ---[ end trace 124dc21b9499878a ]---
Mar 22 17:40:05 kmxbilr2 kernel: [  560.624738] Alignment trap: not handling instruction e1b04f9f at [<80566d48>]
Mar 22 17:40:05 kmxbilr2 kernel: [  560.635458] Unhandled fault: alignment exception (0x001) at 0x9ac47754
Mar 22 17:40:05 kmxbilr2 kernel: [  560.646194] pgd = acecc000
Mar 22 17:40:05 kmxbilr2 kernel: [  560.656802] [9ac47754] *pgd=1ac1141e(bad)
Mar 22 17:40:05 kmxbilr2 kernel: [  560.667391] Internal error: : 1 [#4] SMP ARM
Mar 22 17:40:05 kmxbilr2 kernel: [  560.677894] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
Mar 22 17:40:05 kmxbilr2 kernel: [  560.700191] CPU: 3 PID: 4194 Comm: PeripBusUSBUdev Tainted: G      D  C      4.14.29+ #1
Mar 22 17:40:05 kmxbilr2 kernel: [  560.711520] Hardware name: BCM2835
Mar 22 17:40:05 kmxbilr2 kernel: [  560.724080] task: 81945a00 task.stack: 9e712000
Mar 22 17:40:05 kmxbilr2 kernel: [  560.737866] PC is at lockref_put_return+0x3c/0x94
Mar 22 17:40:05 kmxbilr2 kernel: [  560.750567] LR is at dput+0x40/0x2d0
Mar 22 17:40:05 kmxbilr2 kernel: [  560.761780] pc : [<80566d4c>]    lr : [<802a1abc>]    psr: 20060013
Mar 22 17:40:05 kmxbilr2 kernel: [  560.773122] sp : 9e713de0  ip : 9e713e00  fp : 9e713dfc
Mar 22 17:40:05 kmxbilr2 kernel: [  560.784351] r10: 00000000  r9 : 9e712000  r8 : 9e713f60
Mar 22 17:40:05 kmxbilr2 kernel: [  560.795460] r7 : 00000001  r6 : 00010001  r5 : 00000001  r4 : 00010001
Mar 22 17:40:05 kmxbilr2 kernel: [  560.806753] r3 : 00000000  r2 : 00010001  r1 : 00000000  r0 : 9ac47754
Mar 22 17:40:05 kmxbilr2 kernel: [  560.817978] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Mar 22 17:40:05 kmxbilr2 kernel: [  560.829007] Control: 10c5383d  Table: 2cecc06a  DAC: 00000055
Mar 22 17:40:05 kmxbilr2 kernel: [  560.839867] Process PeripBusUSBUdev (pid: 4194, stack limit = 0x9e712210)
Mar 22 17:40:05 kmxbilr2 kernel: [  560.851294] Stack: (0x9e713de0 to 0x9e714000)
Mar 22 17:40:05 kmxbilr2 kernel: [  560.863891] 3de0: 9ac47704 00080040 9ac47754 00000041 9e713e24 9e713e00 802a1abc 80566d1c
Mar 22 17:40:05 kmxbilr2 kernel: [  560.876801] 3e00: 00000000 9e713e78 9e713f60 00000041 9e713f60 9e712000 9e713e44 9e713e28
Mar 22 17:40:05 kmxbilr2 kernel: [  560.887838] 3e20: 80294b9c 802a1a88 9e713e78 fffffffe 9e713f60 00000041 9e713e74 9e713e48
Mar 22 17:40:05 kmxbilr2 kernel: [  560.900010] 3e40: 802989bc 80294b5c 9e6683f8 af57c6bc 00000000 00000000 00000001 8c6f5000
Mar 22 17:40:05 kmxbilr2 kernel: [  560.915033] 3e60: 9e713e78 00000001 9e713f24 9e713e78 8029a5d8 80298908 ae990310 9ac47704
Mar 22 17:40:05 kmxbilr2 kernel: [  560.929842] 3e80: e73d77c1 00000006 8c6f5025 00000000 ae990c10 ae7af660 ae7db2b0 00000001
Mar 22 17:40:05 kmxbilr2 kernel: [  560.942629] 3ea0: 9e713dd0 000003be 00000000 00000000 00000000 9e713eb8 00000000 00000000
Mar 22 17:40:05 kmxbilr2 kernel: [  560.955273] 3ec0: 00001000 8c6f6000 00000000 00000001 80d0459c 8c6f5000 80d0459c 6b2fea18
Mar 22 17:40:05 kmxbilr2 kernel: [  560.966619] 3ee0: 00000000 00000001 8c6f5000 00000000 8029a30c 00000002 ffffff9c 00000001
Mar 22 17:40:05 kmxbilr2 kernel: [  560.978523] 3f00: ffffff9c 00000001 ffffff9c 9e713f60 ffffff9c 6b2fea18 9e713f4c 9e713f28
Mar 22 17:40:05 kmxbilr2 kernel: [  560.992103] 3f20: 8029a720 8029a548 00000000 00000000 9e713f4c 00000001 00000000 acb3e980
Mar 22 17:40:05 kmxbilr2 kernel: [  561.005850] 3f40: 9e713f94 9e713f50 80287344 8029a6d8 00000000 00000000 aaa67500 00000000
Mar 22 17:40:05 kmxbilr2 kernel: [  561.017377] 3f60: 00000000 00000000 00000005 00000000 5c21ee28 76f78ce8 00000021 80108204
Mar 22 17:40:05 kmxbilr2 kernel: [  561.029029] 3f80: 9e712000 00000000 9e713fa4 9e713f98 802874c4 802872b4 00000000 9e713fa8
Mar 22 17:40:05 kmxbilr2 kernel: [  561.040579] 3fa0: 80108060 802874ac 00000000 5c21ee28 6b2fea18 00000000 00000000 6b2fea33
Mar 22 17:40:05 kmxbilr2 kernel: [  561.052969] 3fc0: 00000000 5c21ee28 76f78ce8 00000021 6b2fea18 6b2fea48 716d7b30 ffffffea
Mar 22 17:40:05 kmxbilr2 kernel: [  561.068944] 3fe0: 75eebf10 6b2fea04 75ee4b7b 759c1b66 20060030 6b2fea18 aa98abba faaaaaea
Mar 22 17:40:05 kmxbilr2 kernel: [  561.083256] [<80566d4c>] (lockref_put_return) from [<802a1abc>] (dput+0x40/0x2d0)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.094752] [<802a1abc>] (dput) from [<80294b9c>] (terminate_walk+0x4c/0xc0)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.106786] [<80294b9c>] (terminate_walk) from [<802989bc>] (path_lookupat+0xc0/0x204)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.120543] [<802989bc>] (path_lookupat) from [<8029a5d8>] (filename_lookup+0x9c/0xf8)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.133721] [<8029a5d8>] (filename_lookup) from [<8029a720>] (user_path_at_empty+0x54/0x5c)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.144836] [<8029a720>] (user_path_at_empty) from [<80287344>] (SyS_faccessat+0x9c/0x1f8)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.155787] [<80287344>] (SyS_faccessat) from [<802874c4>] (SyS_access+0x24/0x28)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.166755] [<802874c4>] (SyS_access) from [<80108060>] (ret_fast_syscall+0x0/0x28)
Mar 22 17:40:05 kmxbilr2 kernel: [  561.177608] Code: e1a07005 da000015 f590f000 e1b04f9f (e1340006) 
Mar 22 17:40:05 kmxbilr2 kernel: [  561.188475] ---[ end trace 124dc21b9499878b ]---

... and dmesg reports

[  555.453472] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
[  555.453480] pgd = a12d8000
[  555.453484] [0000000d] *pgd=00000000
[  555.453493] Internal error: Oops: 5 [#1] SMP ARM
[  555.453578] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  555.454081] CPU: 1 PID: 10626 Comm: sudo Tainted: G         C      4.14.29+ #1
[  555.454170] Hardware name: BCM2835
[  555.454216] task: 91ab1e00 task.stack: 93ea0000
[  555.454287] PC is at locks_remove_posix+0x30/0x14c
[  555.454364] LR is at filp_close+0x68/0x8c
[  555.454447] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 20000013
[  555.454550] sp : 93ea1eb0  ip : 93ea1f58  fp : 93ea1f54
[  555.454616] r10: 00000000  r9 : 93ea0000  r8 : 80108204
[  555.454680] r7 : 00000006  r6 : acc63900  r5 : 9d640180  r4 : ae690440
[  555.454754] r3 : 00000001  r2 : ad1bb940  r1 : acc63900  r0 : 9d640180
[  555.454831] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  555.454914] Control: 10c5383d  Table: 212d806a  DAC: 00000055
[  555.454983] Process sudo (pid: 10626, stack limit = 0x93ea0210)
[  555.455058] Stack: (0x93ea1eb0 to 0x93ea2000)
[  555.455112] 1ea0:                                     802fe824 91ab1e00 00000044 802a8f5c
[  555.455210] 1ec0: 00000004 00000000 00000000 00000100 81f22e80 80d093bc 00000017 808ac294
[  555.455316] 1ee0: 76ebfe2c 93ea1fb0 7ed36760 808a8780 93ea1f14 93ea1f00 808a8780 806fbae8
[  555.455440] 1f00: add51c00 add51d74 93ea1f34 93ea1f18 806fbae8 808a7758 add52400 7f0220bc
[  555.455561] 1f20: add52400 add52444 93ea1f54 00000000 9d640180 acc63900 00000006 80108204
[  555.455679] 1f40: 93ea0000 00000000 93ea1f74 93ea1f58 80286ac4 802e088c 0000003c acc63900
[  555.455787] 1f60: 9d640180 00000006 93ea1f94 93ea1f78 802aad38 80286a68 76fbd218 0000000b
[  555.455892] 1f80: 00000000 00000006 93ea1fa4 93ea1f98 80286b18 802aac7c 00000000 93ea1fa8
[  555.455993] 1fa0: 80108060 80286af4 76fbd218 0000000b 0000003c 7ed36760 76fbcb5c 76fbcb5c
[  555.456102] 1fc0: 76fbd218 0000000b 00000000 00000006 01d185b8 7ed36760 76fface8 0000003c
[  555.456205] 1fe0: 7ed36738 7ed36728 76fbcb6c 76e94fc2 60000030 0000003c 00000000 00000000
[  555.456325] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
[  555.456435] [<80286ac4>] (filp_close) from [<802aad38>] (__close_fd+0xc8/0xec)
[  555.456529] [<802aad38>] (__close_fd) from [<80286b18>] (SyS_close+0x30/0x58)
[  555.456626] [<80286b18>] (SyS_close) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[  555.456728] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
[  555.456857] ---[ end trace 124dc21b94998788 ]---
[  555.477670] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
[  555.477836] pgd = 80004000
[  555.477896] [0000000d] *pgd=00000000
[  555.477964] Internal error: Oops: 5 [#2] SMP ARM
[  555.478032] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  555.478580] CPU: 3 PID: 10620 Comm: sudo Tainted: G      D  C      4.14.29+ #1
[  555.478678] Hardware name: BCM2835
[  555.478725] task: 82818f00 task.stack: 86ea8000
[  555.478800] PC is at locks_remove_posix+0x30/0x14c
[  555.478867] LR is at filp_close+0x68/0x8c
[  555.478933] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 20040013
[  555.479016] sp : 86ea9d08  ip : 86ea9db0  fp : 86ea9dac
[  555.479084] r10: 0000000b  r9 : ad41a980  r8 : 00000004
[  555.479152] r7 : ade91500  r6 : ade91500  r5 : 9d640180  r4 : ae690440
[  555.479234] r3 : 00000001  r2 : 9d640180  r1 : ade91500  r0 : 9d640180
[  555.479316] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  555.479407] Control: 10c5383d  Table: 109d806a  DAC: 00000055
[  555.479486] Process sudo (pid: 10620, stack limit = 0x86ea8210)
[  555.479561] Stack: (0x86ea9d08 to 0x86eaa000)
[  555.479621] 9d00:                   80d87980 809024bc 2e486000 808a6840 af11ad40 86ea9d48
[  555.479728] 9d20: 8c3da480 8025bc28 ae811a40 80e15480 60040013 60040013 0000008c 00033c4b
[  555.479832] 9d40: 00000000 808a8780 86ea9d6c 86ea9d58 808a8780 806fbae8 add51c00 add51d74
[  555.479956] 9d60: 86ea9d8c 86ea9d70 806fbae8 808a7758 add52400 7f0220bc add52400 add52444
[  555.480075] 9d80: 86ea9dac 00000000 9d640180 ade91500 ade91500 00000004 ad41a980 0000000b
[  555.480184] 9da0: 86ea9dcc 86ea9db0 80286ac4 802e088c 00000009 000000f0 00000000 ade91500
[  555.480287] 9dc0: 86ea9df4 86ea9dd0 802aa864 80286a68 82818f00 82819440 ade91500 ae811a40
[  555.480389] 9de0: 00000001 ae811a78 86ea9e14 86ea9df8 802aa974 802aa7bc 82818f00 00000000
[  555.480493] 9e00: 00000544 ae811a40 86ea9e54 86ea9e18 80121d6c 802aa928 86ea9e74 86ea9e28
[  555.480595] 9e20: 86ea9edc 86ea9fb0 00000000 0000000b ac9a02c0 86ea9edc 86ea8000 00106001
[  555.480698] 9e40: 86ea8000 0000000b 86ea9e74 86ea9e58 8012260c 801219e0 00000000 ad6aba88
[  555.480800] 9e60: 86ea9edc 86ea8000 86ea9ec4 86ea9e78 8012d9f4 801225cc 80d02040 418004fc
[  555.480903] 9e80: 80d03d68 86ea9ec8 ac9a02c0 ad6abec4 ad6ab9c0 000000a0 0000000b 76e27574
[  555.494505] 9ea0: 86ea9ec8 86ea9fb0 76e27576 00000000 86ea8000 00000000 86ea9f8c 86ea9ec8
[  555.506477] 9ec0: 8010b318 8012d6c8 86ea9ef4 86ea9ed8 8012d084 8012cf08 00000000 0000000b
[  555.518136] 9ee0: 00000000 00000000 0000297c 00000000 ad6ab9e8 00000001 82818f00 00000000
[  555.531395] 9f00: 00000000 0000297c 00000000 ad6ab9e8 00000001 82818f00 00000001 86ea9f68
[  555.545698] 9f20: 86ea9f64 86ea9f30 8012f17c 801edc50 82819444 00000002 7ed36aa8 00000000
[  555.558183] 9f40: 00000002 7ed36aa8 000000ae 80108204 86ea8000 00000000 86ea9fa4 86ea9f68
[  555.570528] 9f60: 8012f6f0 00000001 86ea8010 80108204 86ea9fb0 80108204 86ea8000 00000000
[  555.582517] 9f80: 86ea9fac 86ea9f90 8010b820 8010b260 7ed36aa8 0000000b 0000000b 00000025
[  555.594335] 9fa0: 00000000 86ea9fb0 80108094 8010b774 00000000 0000000b 3a4d6900 3a4d6900
[  555.607730] 9fc0: 7ed36aa8 0000000b 0000000b 00000025 01d15c58 0049016c 00000000 7ed36a18
[  555.621658] 9fe0: 004a1dd8 7ed369c4 0047dd5b 76e27576 20040030 0000297c a302000d 64690063
[  555.635304] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
[  555.647889] [<80286ac4>] (filp_close) from [<802aa864>] (put_files_struct+0xb4/0x10c)
[  555.660265] [<802aa864>] (put_files_struct) from [<802aa974>] (exit_files+0x58/0x5c)
[  555.672651] [<802aa974>] (exit_files) from [<80121d6c>] (do_exit+0x398/0xba0)
[  555.684677] [<80121d6c>] (do_exit) from [<8012260c>] (do_group_exit+0x4c/0xe4)
[  555.696657] [<8012260c>] (do_group_exit) from [<8012d9f4>] (get_signal+0x338/0x6e0)
[  555.708835] [<8012d9f4>] (get_signal) from [<8010b318>] (do_signal+0xc4/0x3e4)
[  555.720791] [<8010b318>] (do_signal) from [<8010b820>] (do_work_pending+0xb8/0xd0)
[  555.732744] [<8010b820>] (do_work_pending) from [<80108094>] (slow_work_pending+0xc/0x20)
[  555.744414] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
[  555.755952] ---[ end trace 124dc21b94998789 ]---
[  555.772247] Fixing recursive fault but reboot is needed!
[  557.572209] Unable to handle kernel paging request at virtual address 04bc7ffc
[  557.583366] pgd = a0380000
[  557.594624] [04bc7ffc] *pgd=00000000
[  557.605878] Internal error: Oops: 5 [#3] SMP ARM
[  557.616992] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  557.639993] CPU: 3 PID: 10496 Comm: btrfs Tainted: G      D  C      4.14.29+ #1
[  557.651518] Hardware name: BCM2835
[  557.662917] task: 9851bc00 task.stack: a01e4000
[  557.674480] PC is at __d_lookup_rcu+0x68/0x19c
[  557.685867] LR is at lookup_fast+0x4c/0x2c8
[  557.697094] pc : [<802a4da0>]    lr : [<80295cc4>]    psr: 20010013
[  557.708335] sp : a01e5d20  ip : 80d04590  fp : a01e5d5c
[  557.719619] r10: a01e5e50  r9 : 00000006  r8 : e6ad3f44
[  557.730983] r7 : a03d8550  r6 : a03d8550  r5 : 00000000  r4 : 04bc8000
[  557.741942] r3 : 0001cd5a  r2 : 00004000  r1 : 00000000  r0 : a03d8550
[  557.752961] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  557.763554] Control: 10c5383d  Table: 2038006a  DAC: 00000055
[  557.774189] Process btrfs (pid: 10496, stack limit = 0xa01e4210)
[  557.784760] Stack: (0xa01e5d20 to 0xa01e6000)
[  557.795359] 5d20: 8c6f2000 a01e5d6c 00000006 8c6f0038 a01e5d5c a01e5e48 00000000 a01e5da8
[  557.806204] 5d40: a03d8550 a01e5da0 ad639b10 a01e5da4 a01e5d9c a01e5d60 80295cc4 802a4d44
[  557.817041] 5d60: 8c6f2000 ae4085a0 00001455 00000006 00001542 a01e5e48 00000000 00000003
[  557.828472] 5d80: 8c6f003f a01e5e48 8cff9c59 a03d8550 a01e5dd4 a01e5da0 80298140 80295c84
[  557.843155] 5da0: a01e5dc4 a01e5db0 8029683c 8042ea20 c2d0f82e 47090a62 e6ad3f44 8c6f003f
[  557.855354] 5dc0: a01e5e48 8cff9c59 a01e5e24 a01e5dd8 8029856c 80298110 80295448 61c88647
[  557.866613] 5de0: 00000000 00000006 a01e5e48 8c6f0010 a01e5e24 a01e5e00 80294f74 8c6f0010
[  557.878046] 5e00: a01e5e48 a01e5f38 a01e5f38 a01e5f40 00000000 ffffff9c a01e5e44 a01e5e28
[  557.889410] 5e20: 802988c4 802983ec 8c6f0000 a01e5e48 00000000 a01e5f38 a01e5eec a01e5e48
[  557.900746] 5e40: 8029a4b4 8029889c ad639b10 a03d8550 e6ad3f44 00000006 8c6f0038 80276ee4
[  557.912237] 5e60: ae990c10 ae7af660 883d6660 00000050 00000006 000003a4 00000000 00000000
[  557.923647] 5e80: 00000000 a01e5e88 80d0459c 8c6f0000 80d0459c 7ea7a81c 00000000 00000000
[  557.934994] 5ea0: a01e5edc a01e5eb0 8029a30c 805d1904 a01e5e88 8c6f4000 8c6f0000 00000000
[  557.946359] 5ec0: 00000026 00000002 ffffff9c ffffff9c 8c6f4000 7ea7981c a01e5f50 00000026
[  557.957715] 5ee0: a01e5f8c a01e5ef0 8029b984 8029a444 a01e5f50 a01e5f28 98785c08 a01e5ef0
[  557.969150] 5f00: 00000001 802ad51c 98785c00 00000800 00000000 00000000 7ea7a81c ffffff9c
[  557.980558] 5f20: 00000000 00000000 98785c08 00000000 ad639b10 962fd330 a01e4000 00000000
[  557.992053] 5f40: aa64a66c 0000000a 8c6f402a 80276ee4 00000000 00000000 a01e5f7c 98785c00
[  558.003698] 5f60: 98785c00 00000000 00e9a010 00e9a050 00000026 80108204 a01e4000 00000000
[  558.015188] 5f80: a01e5fa4 a01e5f90 8029bd9c 8029b8bc 00000000 00e9a050 00000000 a01e5fa8
[  558.026723] 5fa0: 80108060 8029bd74 00000000 00e9a010 7ea7981c 7ea7a81c ffffffff 00000000
[  558.038198] 5fc0: 00000000 00e9a010 00e9a050 00000026 004fd000 76f05ce8 7ea7981c 7ea7a81c
[  558.049811] 5fe0: 004fd198 7ea79814 0049cc9b 76cd7796 00010030 7ea7981c 00000000 00000000
[  558.061386] [<802a4da0>] (__d_lookup_rcu) from [<80295cc4>] (lookup_fast+0x4c/0x2c8)
[  558.072944] [<80295cc4>] (lookup_fast) from [<80298140>] (walk_component+0x3c/0x2dc)
[  558.084330] [<80298140>] (walk_component) from [<8029856c>] (link_path_walk+0x18c/0x4b0)
[  558.095582] [<8029856c>] (link_path_walk) from [<802988c4>] (path_parentat+0x34/0x6c)
[  558.106851] [<802988c4>] (path_parentat) from [<8029a4b4>] (filename_parentat+0x7c/0x104)
[  558.117923] [<8029a4b4>] (filename_parentat) from [<8029b984>] (SyS_renameat2+0xd4/0x48c)
[  558.129156] [<8029b984>] (SyS_renameat2) from [<8029bd9c>] (SyS_rename+0x34/0x3c)
[  558.140092] [<8029bd9c>] (SyS_rename) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[  558.150966] Code: ea000002 e5944000 e3540000 0a000028 (e5141004) 
[  558.161756] ---[ end trace 124dc21b9499878a ]---
[  560.624738] Alignment trap: not handling instruction e1b04f9f at [<80566d48>]
[  560.635458] Unhandled fault: alignment exception (0x001) at 0x9ac47754
[  560.646194] pgd = acecc000
[  560.656802] [9ac47754] *pgd=1ac1141e(bad)
[  560.667391] Internal error: : 1 [#4] SMP ARM
[  560.677894] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  560.700191] CPU: 3 PID: 4194 Comm: PeripBusUSBUdev Tainted: G      D  C      4.14.29+ #1
[  560.711520] Hardware name: BCM2835
[  560.724080] task: 81945a00 task.stack: 9e712000
[  560.737866] PC is at lockref_put_return+0x3c/0x94
[  560.750567] LR is at dput+0x40/0x2d0
[  560.761780] pc : [<80566d4c>]    lr : [<802a1abc>]    psr: 20060013
[  560.773122] sp : 9e713de0  ip : 9e713e00  fp : 9e713dfc
[  560.784351] r10: 00000000  r9 : 9e712000  r8 : 9e713f60
[  560.795460] r7 : 00000001  r6 : 00010001  r5 : 00000001  r4 : 00010001
[  560.806753] r3 : 00000000  r2 : 00010001  r1 : 00000000  r0 : 9ac47754
[  560.817978] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  560.829007] Control: 10c5383d  Table: 2cecc06a  DAC: 00000055
[  560.839867] Process PeripBusUSBUdev (pid: 4194, stack limit = 0x9e712210)
[  560.851294] Stack: (0x9e713de0 to 0x9e714000)
[  560.863891] 3de0: 9ac47704 00080040 9ac47754 00000041 9e713e24 9e713e00 802a1abc 80566d1c
[  560.876801] 3e00: 00000000 9e713e78 9e713f60 00000041 9e713f60 9e712000 9e713e44 9e713e28
[  560.887838] 3e20: 80294b9c 802a1a88 9e713e78 fffffffe 9e713f60 00000041 9e713e74 9e713e48
[  560.900010] 3e40: 802989bc 80294b5c 9e6683f8 af57c6bc 00000000 00000000 00000001 8c6f5000
[  560.915033] 3e60: 9e713e78 00000001 9e713f24 9e713e78 8029a5d8 80298908 ae990310 9ac47704
[  560.929842] 3e80: e73d77c1 00000006 8c6f5025 00000000 ae990c10 ae7af660 ae7db2b0 00000001
[  560.942629] 3ea0: 9e713dd0 000003be 00000000 00000000 00000000 9e713eb8 00000000 00000000
[  560.955273] 3ec0: 00001000 8c6f6000 00000000 00000001 80d0459c 8c6f5000 80d0459c 6b2fea18
[  560.966619] 3ee0: 00000000 00000001 8c6f5000 00000000 8029a30c 00000002 ffffff9c 00000001
[  560.978523] 3f00: ffffff9c 00000001 ffffff9c 9e713f60 ffffff9c 6b2fea18 9e713f4c 9e713f28
[  560.992103] 3f20: 8029a720 8029a548 00000000 00000000 9e713f4c 00000001 00000000 acb3e980
[  561.005850] 3f40: 9e713f94 9e713f50 80287344 8029a6d8 00000000 00000000 aaa67500 00000000
[  561.017377] 3f60: 00000000 00000000 00000005 00000000 5c21ee28 76f78ce8 00000021 80108204
[  561.029029] 3f80: 9e712000 00000000 9e713fa4 9e713f98 802874c4 802872b4 00000000 9e713fa8
[  561.040579] 3fa0: 80108060 802874ac 00000000 5c21ee28 6b2fea18 00000000 00000000 6b2fea33
[  561.052969] 3fc0: 00000000 5c21ee28 76f78ce8 00000021 6b2fea18 6b2fea48 716d7b30 ffffffea
[  561.068944] 3fe0: 75eebf10 6b2fea04 75ee4b7b 759c1b66 20060030 6b2fea18 aa98abba faaaaaea
[  561.083256] [<80566d4c>] (lockref_put_return) from [<802a1abc>] (dput+0x40/0x2d0)
[  561.094752] [<802a1abc>] (dput) from [<80294b9c>] (terminate_walk+0x4c/0xc0)
[  561.106786] [<80294b9c>] (terminate_walk) from [<802989bc>] (path_lookupat+0xc0/0x204)
[  561.120543] [<802989bc>] (path_lookupat) from [<8029a5d8>] (filename_lookup+0x9c/0xf8)
[  561.133721] [<8029a5d8>] (filename_lookup) from [<8029a720>] (user_path_at_empty+0x54/0x5c)
[  561.144836] [<8029a720>] (user_path_at_empty) from [<80287344>] (SyS_faccessat+0x9c/0x1f8)
[  561.155787] [<80287344>] (SyS_faccessat) from [<802874c4>] (SyS_access+0x24/0x28)
[  561.166755] [<802874c4>] (SyS_access) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[  561.177608] Code: e1a07005 da000015 f590f000 e1b04f9f (e1340006) 
[  561.188475] ---[ end trace 124dc21b9499878b ]---
[  627.422503] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
[  627.436018] pgd = 82af0000
[  627.449159] [0000000d] *pgd=00000000
[  627.460412] Internal error: Oops: 5 [#5] SMP ARM
[  627.470877] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  627.492654] CPU: 0 PID: 10887 Comm: sudo Tainted: G      D  C      4.14.29+ #1
[  627.503546] Hardware name: BCM2835
[  627.514368] task: 91ab3c00 task.stack: 8287a000
[  627.525158] PC is at locks_remove_posix+0x30/0x14c
[  627.535881] LR is at filp_close+0x68/0x8c
[  627.547086] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 20000013
[  627.557884] sp : 8287beb0  ip : 8287bf58  fp : 8287bf54
[  627.568690] r10: 00000000  r9 : 8287a000  r8 : 80108204
[  627.579439] r7 : 00000006  r6 : acc63300  r5 : 9d640180  r4 : ae690440
[  627.590265] r3 : 00000001  r2 : ace99fc0  r1 : acc63300  r0 : 9d640180
[  627.601025] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  627.611944] Control: 10c5383d  Table: 02af006a  DAC: 00000055
[  627.622831] Process sudo (pid: 10887, stack limit = 0x8287a210)
[  627.633779] Stack: (0x8287beb0 to 0x8287c000)
[  627.644595] bea0:                                     802fe824 91ab3c00 00000041 802a8f5c
[  627.655547] bec0: 00000004 00000000 00000000 00000100 81efc688 80d093bc 00000017 808ac294
[  627.666714] bee0: 76e3be2c 8287bfb0 7e94f750 808a8780 8287bf14 8287bf00 808a8780 806fbae8
[  627.677972] bf00: add51c00 add51d74 8287bf34 8287bf18 806fbae8 808a7758 add52400 7f0220bc
[  627.689092] bf20: add52400 add52444 8287bf54 00000000 9d640180 acc63300 00000006 80108204
[  627.699934] bf40: 8287a000 00000000 8287bf74 8287bf58 80286ac4 802e088c 0000003c acc63300
[  627.710825] bf60: 9d640180 00000006 8287bf94 8287bf78 802aad38 80286a68 76f39218 0000000b
[  627.721610] bf80: 00000000 00000006 8287bfa4 8287bf98 80286b18 802aac7c 00000000 8287bfa8
[  627.732532] bfa0: 80108060 80286af4 76f39218 0000000b 0000003c 7e94f750 76f38b5c 76f38b5c
[  627.743512] bfc0: 76f39218 0000000b 00000000 00000006 022b85d0 7e94f750 76f76ce8 0000003c
[  627.754665] bfe0: 7e94f728 7e94f718 76f38b6c 76e10fc2 60000030 0000003c 00000000 00000000
[  627.765898] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
[  627.777075] [<80286ac4>] (filp_close) from [<802aad38>] (__close_fd+0xc8/0xec)
[  627.788388] [<802aad38>] (__close_fd) from [<80286b18>] (SyS_close+0x30/0x58)
[  627.799587] [<80286b18>] (SyS_close) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[  627.810901] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
[  627.823021] ---[ end trace 124dc21b9499878c ]---
[  627.851770] Unable to handle kernel NULL pointer dereference at virtual address 0000000d
[  627.863130] pgd = 80004000
[  627.874478] [0000000d] *pgd=00000000
[  627.885738] Internal error: Oops: 5 [#6] SMP ARM
[  627.896909] Modules linked in: dm_mod dax fuse loop hci_uart bluetooth ecdh_generic sg uio_pdrv_genirq uio lirc_rpi(C) lirc_dev fixed frandom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables ipv6 snd_bcm2835(C) snd_pcm snd_timer snd brcmfmac cfg80211 rfkill brcmutil evdev rpcsec_gss_krb5 tun uinput
[  627.920757] CPU: 3 PID: 10879 Comm: sudo Tainted: G      D  C      4.14.29+ #1
[  627.932515] Hardware name: BCM2835
[  627.944207] task: 91ab6900 task.stack: aeb86000
[  627.955689] PC is at locks_remove_posix+0x30/0x14c
[  627.967555] LR is at filp_close+0x68/0x8c
[  627.979366] pc : [<802e08b0>]    lr : [<80286ac4>]    psr: 28070013
[  627.994526] sp : aeb87d08  ip : aeb87db0  fp : aeb87dac
[  628.006388] r10: 0000000b  r9 : 8d5bba00  r8 : 00000004
[  628.017356] r7 : acc63a00  r6 : acc63a00  r5 : 9d640180  r4 : ae690440
[  628.028340] r3 : 00000001  r2 : 9d640180  r1 : acc63a00  r0 : 9d640180
[  628.039180] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  628.050082] Control: 10c5383d  Table: 2cecc06a  DAC: 00000055
[  628.060845] Process sudo (pid: 10879, stack limit = 0xaeb86210)
[  628.071621] Stack: (0xaeb87d08 to 0xaeb88000)
[  628.082270] 7d00:                   802777a0 801edc50 00000000 80d87c00 ac80efc0 af2bfff0
[  628.093150] 7d20: 8012000e 00000001 ac80efc0 00000001 48070013 000b21eb aeb87d84 8012000e
[  628.104069] 7d40: 80224d2c 808a8780 aeb87d6c aeb87d58 808a8780 806fbae8 add51c00 add51d74
[  628.115133] 7d60: aeb87d8c aeb87d70 806fbae8 808a7758 add52400 7f0220bc add52400 add52444
[  628.126354] 7d80: aeb87dac 00000000 9d640180 acc63a00 acc63a00 00000004 8d5bba00 0000000b
[  628.137591] 7da0: aeb87dcc aeb87db0 80286ac4 802e088c 00000009 000000f0 00000000 acc63a00
[  628.149003] 7dc0: aeb87df4 aeb87dd0 802aa864 80286a68 91ab6900 91ab6e40 acc63a00 ac80efc0
[  628.160356] 7de0: 00000001 ac80eff8 aeb87e14 aeb87df8 802aa974 802aa7bc 91ab6900 00000000
[  628.171934] 7e00: 00000544 ac80efc0 aeb87e54 aeb87e18 80121d6c 802aa928 aeb87e74 aeb87e28
[  628.183447] 7e20: aeb87edc aeb87fb0 00000000 0000000b 82e42100 aeb87edc aeb86000 00106001
[  628.194987] 7e40: aeb86000 0000000b aeb87e74 aeb87e58 8012260c 801219e0 00000000 8bf6da08
[  628.206554] 7e60: aeb87edc aeb86000 aeb87ec4 aeb87e78 8012d9f4 801225cc 80d02040 418004fc
[  628.218157] 7e80: 80d03d68 aeb87ec8 82e42100 8bf6de44 8bf6d940 000000a0 0000000b 76da3574
[  628.230003] 7ea0: aeb87ec8 aeb87fb0 76da3576 00000000 aeb86000 00000000 aeb87f8c aeb87ec8
[  628.241510] 7ec0: 8010b318 8012d6c8 aeb87ef4 aeb87ed8 8012d084 8012cf08 00000000 0000000b
[  628.252971] 7ee0: 00000000 00000000 00002a7f 00000000 8bf6d968 00000001 91ab6900 00000000
[  628.264251] 7f00: 00000000 00002a7f 00000000 8bf6d968 00000001 91ab6900 00000001 aeb87f68
[  628.275572] 7f20: aeb87f64 aeb87f30 8012f17c 801edc50 91ab6e44 00000002 7e94fa98 00000000
[  628.287125] 7f40: 00000002 7e94fa98 000000ae 80108204 aeb86000 00000000 aeb87fa4 aeb87f68
[  628.298685] 7f60: 8012f6f0 00000001 aeb86010 80108204 aeb87fb0 80108204 aeb86000 00000000
[  628.310431] 7f80: aeb87fac aeb87f90 8010b820 8010b260 7e94fa98 0000000b 0000000b 00000025
[  628.322157] 7fa0: 00000000 aeb87fb0 80108094 8010b774 00000000 0000000b 453b0700 453b0700
[  628.334017] 7fc0: 7e94fa98 0000000b 0000000b 00000025 022b5c58 004e516c 00000000 7e94fa08
[  628.345896] 7fe0: 004f6dd8 7e94f9b4 004d2d5b 76da3576 20040030 00002a7f 000001b8 00000000
[  628.357879] [<802e08b0>] (locks_remove_posix) from [<80286ac4>] (filp_close+0x68/0x8c)
[  628.369957] [<80286ac4>] (filp_close) from [<802aa864>] (put_files_struct+0xb4/0x10c)
[  628.381868] [<802aa864>] (put_files_struct) from [<802aa974>] (exit_files+0x58/0x5c)
[  628.393828] [<802aa974>] (exit_files) from [<80121d6c>] (do_exit+0x398/0xba0)
[  628.405659] [<80121d6c>] (do_exit) from [<8012260c>] (do_group_exit+0x4c/0xe4)
[  628.417476] [<8012260c>] (do_group_exit) from [<8012d9f4>] (get_signal+0x338/0x6e0)
[  628.429281] [<8012d9f4>] (get_signal) from [<8010b318>] (do_signal+0xc4/0x3e4)
[  628.440899] [<8010b318>] (do_signal) from [<8010b820>] (do_work_pending+0xb8/0xd0)
[  628.452491] [<8010b820>] (do_work_pending) from [<80108094>] (slow_work_pending+0xc/0x20)
[  628.463887] Code: e59430e8 f57ff05b e3530000 0a000025 (e5b3200c) 
[  628.475205] ---[ end trace 124dc21b9499878d ]---
[  628.488855] Fixing recursive fault but reboot is needed!

mkreisl on 22 Mar 2018

If you have vcgencmd, please report the output of vcgencmd get_throttled.

pelwell on 22 Mar 2018

Hmm, that's interesting...

All of the stack traces in those kernel OOPS messages implicate absolutely nothing from the networking stack, they've just got generic core kernel stuff and VFS layer functions referenced.

The complaint about a kernel NULL pointer dereference at a very low virtual address is also somewhat suspicious. Stuff like that is usually indicative of either hardware failure, or a very poorly behaved kernel driver scribbling on memory locations it shouldn't be touching.

Ferroin on 22 Mar 2018

All of the stack traces in those kernel OOPS messages implicate absolutely nothing from the networking stack, they've just got generic core kernel stuff and VFS layer functions referenced.

I know, could it be that corrupted data read from usb disk caused this?

mkreisl on 22 Mar 2018

'Ported' the backup script to Raspbian, an run it there to backup one of my btrfs partition. Similar result, did not finished. btrfs send/receive stucks, nothing in logs. Tested with original 4.14.27-v7+ kernel and my 4.15.10+ kernel, and of course, backup was running successful on Pi3B

@pelwell

vcgencmd get_throttled
throttled=0x0

mkreisl on 22 Mar 2018

Thanks - that rules out a power supply issue.

pelwell on 22 Mar 2018

Thanks - that rules out a power supply issue.

Already tried different PSU's, usually using brand new 2.5A PSU for Pi3B+ and an older one (2A) for Pi3B

mkreisl on 22 Mar 2018

Yes, so you said, but I'd rather hear it from the Pi.

pelwell on 22 Mar 2018

@pelwell
Are there any tests I can run to make absolutely sure that hardware is ok?

Btw, I have similar issues if root fs is on iSCSI target and no usb drive is connected and used.

mkreisl on 22 Mar 2018

@pelwell @Ferroin
Oh, and before I forget:
If I'm using onboard WLAN instead of onboard ethernet, issues are still there
and, already changed network cable and port on the network switch
and, switching root to sd-card does not make any difference, getting this on my final test for today:

[  515.046170] CIFS VFS: sends on sock a715d500 stuck for 15 seconds
[  515.046196] CIFS VFS: Error -11 sending data on socket to server
[  517.443871] systemd-logind[3010]: New session c2 of user root.
[  530.166192] CIFS VFS: sends on sock a715d500 stuck for 15 seconds
[  530.166217] CIFS VFS: Error -11 sending data on socket to server
[  540.648102] Status code returned 0xc0000008 NT_STATUS_INVALID_HANDLE
[  540.648134] CIFS VFS: Send error in read = -9
[  540.648161] print_req_error: I/O error, dev loop0, sector 231328
[  540.648195] BTRFS error (device dm-1): bdev /dev/mapper/loop0p2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
[  735.206538] INFO: task kworker/u8:2:219 blocked for more than 120 seconds.
[  735.206553]       Tainted: G         C      4.14.29+ #1
[  735.206558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  735.206567] kworker/u8:2    D    0   219      2 0x00000000
[  735.206596] Workqueue: writeback wb_workfn (flush-btrfs-3)
[  735.206633] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  735.206652] [<808a6544>] (schedule) from [<80476418>] (btrfs_tree_lock+0x15c/0x278)
[  735.206672] [<80476418>] (btrfs_tree_lock) from [<804578e0>] (lock_extent_buffer_for_io+0xec/0x270)
[  735.206694] [<804578e0>] (lock_extent_buffer_for_io) from [<8045b160>] (btree_write_cache_pages+0x248/0x344)
[  735.206716] [<8045b160>] (btree_write_cache_pages) from [<80422eb0>] (btree_writepages+0x84/0x8c)
[  735.206734] [<80422eb0>] (btree_writepages) from [<8022edb4>] (do_writepages+0x30/0x8c)
[  735.206750] [<8022edb4>] (do_writepages) from [<802bb208>] (__writeback_single_inode+0x44/0x434)
[  735.206766] [<802bb208>] (__writeback_single_inode) from [<802bbb08>] (writeback_sb_inodes+0x214/0x4c4)
[  735.206781] [<802bbb08>] (writeback_sb_inodes) from [<802bbe48>] (__writeback_inodes_wb+0x90/0xd0)
[  735.206797] [<802bbe48>] (__writeback_inodes_wb) from [<802bc120>] (wb_writeback+0x298/0x33c)
[  735.206811] [<802bc120>] (wb_writeback) from [<802bc9b8>] (wb_workfn+0xdc/0x4d8)
[  735.206831] [<802bc9b8>] (wb_workfn) from [<801374b8>] (process_one_work+0x158/0x454)
[  735.206850] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  735.206868] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  735.206886] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  735.206920] INFO: task kworker/u8:3:399 blocked for more than 120 seconds.
[  735.206926]       Tainted: G         C      4.14.29+ #1
[  735.206931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  735.206936] kworker/u8:3    D    0   399      2 0x00000000
[  735.206954] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[  735.206972] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  735.206988] [<808a6544>] (schedule) from [<808aa12c>] (schedule_timeout+0x1d0/0x3e4)
[  735.207005] [<808aa12c>] (schedule_timeout) from [<808a71cc>] (wait_for_common+0xc0/0x184)
[  735.207021] [<808a71cc>] (wait_for_common) from [<808a72b0>] (wait_for_completion+0x20/0x24)
[  735.207036] [<808a72b0>] (wait_for_completion) from [<8040c478>] (btrfs_async_run_delayed_refs+0x12c/0x14c)
[  735.207053] [<8040c478>] (btrfs_async_run_delayed_refs) from [<8042d7ec>] (__btrfs_end_transaction+0x228/0x324)
[  735.207069] [<8042d7ec>] (__btrfs_end_transaction) from [<8042d904>] (btrfs_end_transaction+0x1c/0x20)
[  735.207085] [<8042d904>] (btrfs_end_transaction) from [<80438b40>] (btrfs_finish_ordered_io+0x230/0x848)
[  735.207102] [<80438b40>] (btrfs_finish_ordered_io) from [<8043955c>] (finish_ordered_fn+0x1c/0x20)
[  735.207118] [<8043955c>] (finish_ordered_fn) from [<8046a1d8>] (normal_work_helper+0xb0/0x3e0)
[  735.207135] [<8046a1d8>] (normal_work_helper) from [<8046a8f0>] (btrfs_endio_write_helper+0x1c/0x20)
[  735.207152] [<8046a8f0>] (btrfs_endio_write_helper) from [<801374b8>] (process_one_work+0x158/0x454)
[  735.207170] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  735.207187] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  735.207203] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  735.207217] INFO: task kworker/u8:6:759 blocked for more than 120 seconds.
[  735.207224]       Tainted: G         C      4.14.29+ #1
[  735.207229] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  735.207233] kworker/u8:6    D    0   759      2 0x00000000
[  735.207250] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[  735.207269] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  735.207284] [<808a6544>] (schedule) from [<80475e2c>] (btrfs_tree_read_lock+0x120/0x198)
[  735.207303] [<80475e2c>] (btrfs_tree_read_lock) from [<803f9660>] (btrfs_read_lock_root_node+0x38/0x50)
[  735.207321] [<803f9660>] (btrfs_read_lock_root_node) from [<803ff188>] (btrfs_search_slot+0x8b0/0xb10)
[  735.207339] [<803ff188>] (btrfs_search_slot) from [<80400f10>] (btrfs_insert_empty_items+0x7c/0xd4)
[  735.207357] [<80400f10>] (btrfs_insert_empty_items) from [<8040f01c>] (__btrfs_run_delayed_refs+0xe90/0x16fc)
[  735.207374] [<8040f01c>] (__btrfs_run_delayed_refs) from [<80412d9c>] (btrfs_run_delayed_refs+0x9c/0x310)
[  735.207390] [<80412d9c>] (btrfs_run_delayed_refs) from [<804130c4>] (delayed_ref_async_start+0xb4/0xc0)
[  735.207405] [<804130c4>] (delayed_ref_async_start) from [<8046a1d8>] (normal_work_helper+0xb0/0x3e0)
[  735.207419] [<8046a1d8>] (normal_work_helper) from [<8046a990>] (btrfs_extent_refs_helper+0x1c/0x20)
[  735.207437] [<8046a990>] (btrfs_extent_refs_helper) from [<801374b8>] (process_one_work+0x158/0x454)
[  735.207455] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  735.207471] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  735.207487] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  735.207625] INFO: task btrfs-transacti:8152 blocked for more than 120 seconds.
[  735.207632]       Tainted: G         C      4.14.29+ #1
[  735.207637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  735.207641] btrfs-transacti D    0  8152      2 0x00000000
[  735.207663] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  735.207678] [<808a6544>] (schedule) from [<8014b128>] (io_schedule+0x20/0x40)
[  735.207693] [<8014b128>] (io_schedule) from [<8021cd60>] (wait_on_page_bit+0x120/0x140)
[  735.207711] [<8021cd60>] (wait_on_page_bit) from [<80459a00>] (read_extent_buffer_pages+0x258/0x33c)
[  735.207731] [<80459a00>] (read_extent_buffer_pages) from [<80422350>] (btree_read_extent_buffer_pages+0xb0/0x120)
[  735.207747] [<80422350>] (btree_read_extent_buffer_pages) from [<80423818>] (read_tree_block+0x38/0x54)
[  735.207763] [<80423818>] (read_tree_block) from [<803f7914>] (read_node_slot+0xc4/0x100)
[  735.207779] [<803f7914>] (read_node_slot) from [<803fd508>] (push_leaf_right+0xb8/0x1f0)
[  735.207796] [<803fd508>] (push_leaf_right) from [<803fe6c8>] (split_leaf+0x5f4/0x804)
[  735.207813] [<803fe6c8>] (split_leaf) from [<803ff270>] (btrfs_search_slot+0x998/0xb10)
[  735.207830] [<803ff270>] (btrfs_search_slot) from [<80400f10>] (btrfs_insert_empty_items+0x7c/0xd4)
[  735.207847] [<80400f10>] (btrfs_insert_empty_items) from [<8040f01c>] (__btrfs_run_delayed_refs+0xe90/0x16fc)
[  735.207863] [<8040f01c>] (__btrfs_run_delayed_refs) from [<80412d9c>] (btrfs_run_delayed_refs+0x9c/0x310)
[  735.207879] [<80412d9c>] (btrfs_run_delayed_refs) from [<8042c378>] (btrfs_commit_transaction+0x38/0xc2c)
[  735.207893] [<8042c378>] (btrfs_commit_transaction) from [<804274bc>] (transaction_kthread+0x1b4/0x1c8)
[  735.207908] [<804274bc>] (transaction_kthread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  735.207923] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  735.207947] INFO: task btrfs:9851 blocked for more than 120 seconds.
[  735.207953]       Tainted: G         C      4.14.29+ #1
[  735.207958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  735.207963] btrfs           D    0  9851   9850 0x00000000
[  735.207984] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  735.207999] [<808a6544>] (schedule) from [<8044fdec>] (btrfs_start_ordered_extent+0x128/0x158)
[  735.208016] [<8044fdec>] (btrfs_start_ordered_extent) from [<80450394>] (btrfs_wait_ordered_range+0x140/0x18c)
[  735.208032] [<80450394>] (btrfs_wait_ordered_range) from [<8043b4f0>] (btrfs_truncate+0x54/0x2c8)
[  735.208048] [<8043b4f0>] (btrfs_truncate) from [<8043c074>] (btrfs_setattr+0x2f4/0x488)
[  735.208067] [<8043c074>] (btrfs_setattr) from [<802a9318>] (notify_change+0x1cc/0x408)
[  735.208084] [<802a9318>] (notify_change) from [<80286c98>] (do_truncate+0x90/0xc0)
[  735.208099] [<80286c98>] (do_truncate) from [<80286eac>] (vfs_truncate+0x1e4/0x25c)
[  735.208113] [<80286eac>] (vfs_truncate) from [<80286fa0>] (do_sys_truncate+0x7c/0xac)
[  735.208127] [<80286fa0>] (do_sys_truncate) from [<80287200>] (SyS_truncate64+0x18/0x1c)
[  735.208143] [<80287200>] (SyS_truncate64) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[  858.087442] INFO: task kworker/u8:2:219 blocked for more than 120 seconds.
[  858.087457]       Tainted: G         C      4.14.29+ #1
[  858.087462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.087469] kworker/u8:2    D    0   219      2 0x00000000
[  858.087503] Workqueue: writeback wb_workfn (flush-btrfs-3)
[  858.087538] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  858.087557] [<808a6544>] (schedule) from [<80476418>] (btrfs_tree_lock+0x15c/0x278)
[  858.087579] [<80476418>] (btrfs_tree_lock) from [<804578e0>] (lock_extent_buffer_for_io+0xec/0x270)
[  858.087601] [<804578e0>] (lock_extent_buffer_for_io) from [<8045b160>] (btree_write_cache_pages+0x248/0x344)
[  858.087624] [<8045b160>] (btree_write_cache_pages) from [<80422eb0>] (btree_writepages+0x84/0x8c)
[  858.087642] [<80422eb0>] (btree_writepages) from [<8022edb4>] (do_writepages+0x30/0x8c)
[  858.087659] [<8022edb4>] (do_writepages) from [<802bb208>] (__writeback_single_inode+0x44/0x434)
[  858.087675] [<802bb208>] (__writeback_single_inode) from [<802bbb08>] (writeback_sb_inodes+0x214/0x4c4)
[  858.087690] [<802bbb08>] (writeback_sb_inodes) from [<802bbe48>] (__writeback_inodes_wb+0x90/0xd0)
[  858.087705] [<802bbe48>] (__writeback_inodes_wb) from [<802bc120>] (wb_writeback+0x298/0x33c)
[  858.087719] [<802bc120>] (wb_writeback) from [<802bc9b8>] (wb_workfn+0xdc/0x4d8)
[  858.087739] [<802bc9b8>] (wb_workfn) from [<801374b8>] (process_one_work+0x158/0x454)
[  858.087758] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  858.087775] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  858.087793] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  858.087827] INFO: task kworker/u8:3:399 blocked for more than 120 seconds.
[  858.087833]       Tainted: G         C      4.14.29+ #1
[  858.087838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.087843] kworker/u8:3    D    0   399      2 0x00000000
[  858.087861] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[  858.087879] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  858.087895] [<808a6544>] (schedule) from [<808aa12c>] (schedule_timeout+0x1d0/0x3e4)
[  858.087912] [<808aa12c>] (schedule_timeout) from [<808a71cc>] (wait_for_common+0xc0/0x184)
[  858.087929] [<808a71cc>] (wait_for_common) from [<808a72b0>] (wait_for_completion+0x20/0x24)
[  858.087944] [<808a72b0>] (wait_for_completion) from [<8040c478>] (btrfs_async_run_delayed_refs+0x12c/0x14c)
[  858.087961] [<8040c478>] (btrfs_async_run_delayed_refs) from [<8042d7ec>] (__btrfs_end_transaction+0x228/0x324)
[  858.087976] [<8042d7ec>] (__btrfs_end_transaction) from [<8042d904>] (btrfs_end_transaction+0x1c/0x20)
[  858.087994] [<8042d904>] (btrfs_end_transaction) from [<80438b40>] (btrfs_finish_ordered_io+0x230/0x848)
[  858.088011] [<80438b40>] (btrfs_finish_ordered_io) from [<8043955c>] (finish_ordered_fn+0x1c/0x20)
[  858.088027] [<8043955c>] (finish_ordered_fn) from [<8046a1d8>] (normal_work_helper+0xb0/0x3e0)
[  858.088042] [<8046a1d8>] (normal_work_helper) from [<8046a8f0>] (btrfs_endio_write_helper+0x1c/0x20)
[  858.088060] [<8046a8f0>] (btrfs_endio_write_helper) from [<801374b8>] (process_one_work+0x158/0x454)
[  858.088077] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  858.088094] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  858.088109] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  858.088124] INFO: task kworker/u8:6:759 blocked for more than 120 seconds.
[  858.088130]       Tainted: G         C      4.14.29+ #1
[  858.088135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.088140] kworker/u8:6    D    0   759      2 0x00000000
[  858.088157] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[  858.088176] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  858.088191] [<808a6544>] (schedule) from [<80475e2c>] (btrfs_tree_read_lock+0x120/0x198)
[  858.088210] [<80475e2c>] (btrfs_tree_read_lock) from [<803f9660>] (btrfs_read_lock_root_node+0x38/0x50)
[  858.088228] [<803f9660>] (btrfs_read_lock_root_node) from [<803ff188>] (btrfs_search_slot+0x8b0/0xb10)
[  858.088246] [<803ff188>] (btrfs_search_slot) from [<80400f10>] (btrfs_insert_empty_items+0x7c/0xd4)
[  858.088264] [<80400f10>] (btrfs_insert_empty_items) from [<8040f01c>] (__btrfs_run_delayed_refs+0xe90/0x16fc)
[  858.088281] [<8040f01c>] (__btrfs_run_delayed_refs) from [<80412d9c>] (btrfs_run_delayed_refs+0x9c/0x310)
[  858.088297] [<80412d9c>] (btrfs_run_delayed_refs) from [<804130c4>] (delayed_ref_async_start+0xb4/0xc0)
[  858.088312] [<804130c4>] (delayed_ref_async_start) from [<8046a1d8>] (normal_work_helper+0xb0/0x3e0)
[  858.088326] [<8046a1d8>] (normal_work_helper) from [<8046a990>] (btrfs_extent_refs_helper+0x1c/0x20)
[  858.088344] [<8046a990>] (btrfs_extent_refs_helper) from [<801374b8>] (process_one_work+0x158/0x454)
[  858.088362] [<801374b8>] (process_one_work) from [<80137810>] (worker_thread+0x5c/0x5b0)
[  858.088378] [<80137810>] (worker_thread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  858.088394] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  858.088532] INFO: task btrfs-transacti:8152 blocked for more than 120 seconds.
[  858.088539]       Tainted: G         C      4.14.29+ #1
[  858.088544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.088548] btrfs-transacti D    0  8152      2 0x00000000
[  858.088570] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  858.088585] [<808a6544>] (schedule) from [<8014b128>] (io_schedule+0x20/0x40)
[  858.088600] [<8014b128>] (io_schedule) from [<8021cd60>] (wait_on_page_bit+0x120/0x140)
[  858.088617] [<8021cd60>] (wait_on_page_bit) from [<80459a00>] (read_extent_buffer_pages+0x258/0x33c)
[  858.088637] [<80459a00>] (read_extent_buffer_pages) from [<80422350>] (btree_read_extent_buffer_pages+0xb0/0x120)
[  858.088653] [<80422350>] (btree_read_extent_buffer_pages) from [<80423818>] (read_tree_block+0x38/0x54)
[  858.088669] [<80423818>] (read_tree_block) from [<803f7914>] (read_node_slot+0xc4/0x100)
[  858.088686] [<803f7914>] (read_node_slot) from [<803fd508>] (push_leaf_right+0xb8/0x1f0)
[  858.088703] [<803fd508>] (push_leaf_right) from [<803fe6c8>] (split_leaf+0x5f4/0x804)
[  858.088720] [<803fe6c8>] (split_leaf) from [<803ff270>] (btrfs_search_slot+0x998/0xb10)
[  858.088737] [<803ff270>] (btrfs_search_slot) from [<80400f10>] (btrfs_insert_empty_items+0x7c/0xd4)
[  858.088754] [<80400f10>] (btrfs_insert_empty_items) from [<8040f01c>] (__btrfs_run_delayed_refs+0xe90/0x16fc)
[  858.088770] [<8040f01c>] (__btrfs_run_delayed_refs) from [<80412d9c>] (btrfs_run_delayed_refs+0x9c/0x310)
[  858.088786] [<80412d9c>] (btrfs_run_delayed_refs) from [<8042c378>] (btrfs_commit_transaction+0x38/0xc2c)
[  858.088800] [<8042c378>] (btrfs_commit_transaction) from [<804274bc>] (transaction_kthread+0x1b4/0x1c8)
[  858.088814] [<804274bc>] (transaction_kthread) from [<8013d8a4>] (kthread+0x13c/0x16c)
[  858.088830] [<8013d8a4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)
[  858.088854] INFO: task btrfs:9851 blocked for more than 120 seconds.
[  858.088860]       Tainted: G         C      4.14.29+ #1
[  858.088865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.088869] btrfs           D    0  9851   9850 0x00000000
[  858.088891] [<808a5ecc>] (__schedule) from [<808a6544>] (schedule+0x50/0xa8)
[  858.088906] [<808a6544>] (schedule) from [<8044fdec>] (btrfs_start_ordered_extent+0x128/0x158)
[  858.088924] [<8044fdec>] (btrfs_start_ordered_extent) from [<80450394>] (btrfs_wait_ordered_range+0x140/0x18c)
[  858.088941] [<80450394>] (btrfs_wait_ordered_range) from [<8043b4f0>] (btrfs_truncate+0x54/0x2c8)
[  858.088957] [<8043b4f0>] (btrfs_truncate) from [<8043c074>] (btrfs_setattr+0x2f4/0x488)
[  858.088975] [<8043c074>] (btrfs_setattr) from [<802a9318>] (notify_change+0x1cc/0x408)
[  858.088993] [<802a9318>] (notify_change) from [<80286c98>] (do_truncate+0x90/0xc0)
[  858.089008] [<80286c98>] (do_truncate) from [<80286eac>] (vfs_truncate+0x1e4/0x25c)
[  858.089022] [<80286eac>] (vfs_truncate) from [<80286fa0>] (do_sys_truncate+0x7c/0xac)
[  858.089036] [<80286fa0>] (do_sys_truncate) from [<80287200>] (SyS_truncate64+0x18/0x1c)
[  858.089052] [<80287200>] (SyS_truncate64) from [<80108060>] (ret_fast_syscall+0x0/0x28)

Btw, system (root partition) was cloned from usb partition to sd-card using same backup procedure and ran without any issues.

mkreisl on 22 Mar 2018

Have you got another Pi3B+ you can try this on. It really is very bizarre, and bizarre always makes me think HW fault.

JamesH65 on 22 Mar 2018

Have you got another Pi3B+ you can try this on. It really is very bizarre, and bizarre always makes me think HW fault.

No, unfortunately not

mkreisl on 22 Mar 2018

It doesn't feel like a hardware fault to me - if so, you would expect to see it on a stock Raspbian kernel as well, and so far I don't think we have.

pelwell on 22 Mar 2018

if so, you would expect to see it on a stock Raspbian kernel as well

Already posted. It happens on Raspbian using standard kernel (4.14.27+) as well

mkreisl on 22 Mar 2018

Yes, reading your earlier comment again I can see how it means that.

Can you try limiting the ARM cores to 1.2GHz by adding the following to config.txt?:

arm_freq=1200

pelwell on 23 Mar 2018

@pelwell
I'll try this. Currently running another test to a different machine, with very strange result

getting periodically message like this:

[ 1416.176886] nfs: server kmxbmc not responding, still trying
[ 1417.217789] nfs: server kmxbmc not responding, still trying
[ 1417.575681] nfs: server kmxbmc OK
[ 1417.588839] nfs: server kmxbmc OK

The interval is exactly 239s. Any idea where this value comes from. NFS share is mounted as follows

kmxbmc:/srv on /mnt type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.5,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=192.168.1.5)

mkreisl on 23 Mar 2018

@pelwell
Doesn't help :disappointed:

mkreisl on 23 Mar 2018

... and it does not depend on kernel version. 4.9.xx producing same issues

mkreisl on 23 Mar 2018

There is an interesting thread on OSMC forum about a 3B+ network issue. When user connected his RPi directly to the switch, issue disappeared.

RPi 3B (without +) can reach 100 mbps max, so can this issue be related to the higher speed that requires network adjustments i.e. a bigger buffer. Did changing MTU size, or Ethernet cables, or using different 100mbps/1Gbps switches make any difference?

It would be helpful if @mkreisl explain how is his RPi connected exactly? When working what speeds RPi 3B and 3B+ achieve?

fieryo on 23 Mar 2018

@fiery-
Nothing special. Connected directly to my router (4-port GBit switch) via 5m CAT5 ethernet cable
Sending data from /dev/zero to server 33MB/s
Receiving data from server 29MB/s
You haven't read this thread completey. Issue appears on WLAN connection also

mkreisl on 23 Mar 2018

@mkreisl
Any suspicious output in dmesg?
What is the MTU of your onboard ethernet interface?
Does it help to reduce the MTU to 1496?
What the output of ethtool -S eth0 after the issue appeared?

lategoodbye on 23 Mar 2018

@lategoodbye

Any suspicious output in dmesg?

Already posted dmesg (see above). But output differs extremely. Sometimes nothing, copy process just stucks, sometimes system completely freezed, sometimes message below, sometimes kernel Oops ...

[  347.207072] CIFS VFS: sends on sock 81c33340 stuck for 15 seconds
[  347.207099] CIFS VFS: Error -11 sending data on socket to server
[  362.328036] CIFS VFS: sends on sock 81c33340 stuck for 15 seconds
[  362.328066] CIFS VFS: Error -11 sending data on socket to server
[  362.418887] CIFS VFS: Free previous auth_key.response = ad8a3540
[  368.135439] CIFS VFS: Free previous auth_key.response = 9cd2a840
[  372.970214] Status code returned 0xc0000128 STATUS_FILE_CLOSED
[  372.970259] CIFS VFS: Send error in read = -9
[  372.970297] print_req_error: I/O error, dev loop0, sector 296512
[  372.970334] BTRFS error (device dm-1): bdev /dev/mapper/loop0p2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
[  389.918475] print_req_error: I/O error, dev loop0, sector 0
[  389.918755] BTRFS error (device dm-1): bdev /dev/mapper/loop0p2 errs: wr 0, rd 1, flush 1, corrupt 0, gen 0
[  389.918775] BTRFS warning (device dm-1): chunk 1048576 missing 1 devices, max tolerance is 0 for writeable mount
[  389.918808] BTRFS: error (device dm-1) in write_all_supers:3670: errno=-5 IO failure (errors while submitting device barriers.)
[  389.918825] BTRFS info (device dm-1): forced readonly
[  389.918841] BTRFS warning (device dm-1): Skipping commit of aborted transaction.
[  389.918851] BTRFS: error (device dm-1) in cleanup_transaction:1873: errno=-5 IO failure
[  389.918863] BTRFS info (device dm-1): delayed_refs has NO entry

What is the MTU of your onboard ethernet interface?

Default (1500)

Does it help to reduce the MTU to 1496?

No

What the output of ethtool -S eth0 after the issue appeared?

NIC statistics:
     RX FCS Errors: 0
     RX Alignment Errors: 0
     Rx Fragment Errors: 0
     RX Jabber Errors: 0
     RX Undersize Frame Errors: 0
     RX Oversize Frame Errors: 0
     RX Dropped Frames: 0
     RX Unicast Byte Count: 82919099
     RX Broadcast Byte Count: 32537
     RX Multicast Byte Count: 199380
     RX Unicast Frames: 460932
     RX Broadcast Frames: 394
     RX Multicast Frames: 554
     RX Pause Frames: 708426
     RX 64 Byte Frames: 329
     RX 65 - 127 Byte Frames: 428916
     RX 128 - 255 Byte Frames: 2601
     RX 256 - 511 Bytes Frames: 592
     RX 512 - 1023 Byte Frames: 307
     RX 1024 - 1518 Byte Frames: 29135
     RX Greater 1518 Byte Frames: 0
     EEE RX LPI Transitions: 0
     EEE RX LPI Time: 0
     TX FCS Errors: 0
     TX Excess Deferral Errors: 0
     TX Carrier Errors: 0
     TX Bad Byte Count: 0
     TX Single Collisions: 0
     TX Multiple Collisions: 0
     TX Excessive Collision: 0
     TX Late Collisions: 0
     TX Unicast Byte Count: 1673525473
     TX Broadcast Byte Count: 4804
     TX Multicast Byte Count: 26394
     TX Unicast Frames: 1126605
     TX Broadcast Frames: 44
     TX Multicast Frames: 118
     TX Pause Frames: 8
     TX 64 Byte Frames: 37
     TX 65 - 127 Byte Frames: 13575
     TX 128 - 255 Byte Frames: 8217
     TX 256 - 511 Bytes Frames: 1747
     TX 512 - 1023 Byte Frames: 144
     TX 1024 - 1518 Byte Frames: 1103047
     TX Greater 1518 Byte Frames: 0
     EEE TX LPI Transitions: 0
     EEE TX LPI Time: 0

mkreisl on 23 Mar 2018

Now I did final test, put 100MBit switch between, to force a 100MBit link instead of 1GBit.

Makes no difference, now sending back that crap :angry:

mkreisl on 23 Mar 2018

The same with samba and share (NFS / etx4 ...) In HDD extern or SD internal. After reboot its work perfectly for 30/40min and after kick of share and reconnect again and again.
Network work perfectly (no ping out ...) I feel owerload the queue or something like that.

Knoppix1 on 23 Mar 2018

👍3

Same problem here. I'm using Samba and external USB HDD. It's quite random, but it's just a matter of time until it happens, no dmesg problems indicated, throttled=0x0. RPi is connected directly to 8 port GB switch via cat5e cable.

Otw-cz on 23 Mar 2018

👍4

I have exactly the same problem and I'm glad to see I'm not alone in this, after rebooting I can transfer files through sftp (using filezilla on windows) without issue but after half an hour or so after booting up the transfers don't work anymore: It hangs after a second or two of high speed transfer (18MBps ), then times out after and tries to reconnect after 20 seconds only to fail again after a second or two of transfer. I also tried the same with transferring the files from the sd card (without hdd connected) and it has the same issues, when I put the sd card into a 3B it has no issues whatsoever tranferring these files even from the hard drive.
I'm using Raspbian and the official Raspberry Pi charger if that helps.
If it is hardware related I'd love to know so I can get it exchanged by the seller.

ltctceplrm on 24 Mar 2018

👍5

Seems that the Raspberry Pi team hasn't tested anything before releasing the Pi3B+ :angry:

mkreisl on 24 Mar 2018

Yes, that right, because we just knocked up some hardware, and never
checked anything worked before releasing it.....

Or on the other hand perhaps we spent a year making this product and did,
actually, test it, but just never saw this error?

Might also be worth considering that we have sold >200k B+ already, and the
number of reports of this error are small in comparison, so its perhaps
some weird interaction between hubs, drivers, firmware and hardware.
Whatever it is, we are looking in to it. My suspicion is that its a driver
issue that we will figure out, although sometimes these things can take a
while. On the 3B, there was a bug in the smsc95xx driver, matched with a
bug in the brcmfmac software, that we didn't get to the bottom off for
about a year - but its was a difficult bug to reproduce. Neigher pieces of
software written by us incidentally, in much the same way that the lan78xx
driver or the chip firmware, which is presumably at fault here, wasn't
written by us either.

On 24 March 2018 at 17:47, Manfred Kreisl notifications@github.com wrote:

Seems that the Raspberry Pi team hasn't tested anything before releasing
the Pi3B+ 😠

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/raspberrypi/linux/issues/2449#issuecomment-375911662,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADqrHVU1fgXWVDM31GSuGtcwNmzeW65yks5thobIgaJpZM4SvXe8
.

--
James Hughes
Principal Software Engineer,
Raspberry Pi (Trading) Ltd

JamesH65 on 24 Mar 2018

👍4 😕2

I'd just like to say that the team at Raspberry Pi are awesome! There's always niggles with new hardware and software, it's par for the course, and impossible to test every permutation out there. I have full confidence that they'll look into whatever issues there are and get a fix out when they can!

pjgpetecodes on 24 Mar 2018

👍1

@JamesH65
You havent read this thread completely

This issue happens regardless if lan78xx driver is used or not. I was building lan78xx as module and unloaded it before starting test
And issue remains

mkreisl on 24 Mar 2018

😄1

I did say I think its a driver problem (lan78xx might be a culprit, but apparently not), and having skimmed the thread that still seems the most likely explanation. Something somewhere is dumping on memory it shouldn't (which, incidentally, was the cause of the bug I referred to above, one driver was dumping on another drivers skb allocation). As always with these things, if we can replicate the issue at our end, then we have a much better chance of figuring it out.

JamesH65 on 24 Mar 2018

The effort to find and fix this problem would be helped enormously if someone could write down the steps to get from a known OS image (ideally Raspbian, but it doesn't have to be) to an observed failure.

pelwell on 24 Mar 2018

@pelwell

The effort to find and fix this problem would be helped enormously if someone could write down the steps to get from a known OS image (ideally Raspbian, but it doesn't have to be) to an observed failure.

1) Install XBian
2) mount network share
3) run backup inside of xbian-config, menu item 6 (xbian copier), use file:/mount-to network-share/test.img as destination

Maybe some more data has to be written to the fs before running test (in my case I have about 2.5GB data to copy)

That's in short words what I'm doing

ps
Actually I have been successfuly written 5 images after

1) building lan78xx driver as module
2) using wlan and not ethernet, unloading lan78xx module (this makes things a bit better but issue still remains and Kodi crashes randomly)
3) powering off the usb part by running script (after this everything seems to be stable and Kodi does not crash)

#!/bin/bash

BUSPOWER="/sys/devices/platform/bcm2708_usb/buspower"
for USB in $(find /sys/devices/platform/soc/ -name *.usb 2>/dev/null); do
    [ -e $USB ] && { BUSPOWER=$USB/buspower; break; }
done

echo 0x0 > $BUSPOWER

For me it sounds like the usbhub/ethernet (that's one of the major difference to Pi3B) part does not get stable power for a reliable work (that chip has higher power consumption than the chip used on Pi3B)

mkreisl on 24 Mar 2018

👍2

Thank you - your instructions look mercifully short and simple. As you say, the lan78xx and its driver seems the most likely culprit, but I don't want to leap to any conclusions.

pelwell on 24 Mar 2018

👍1

Thank you - your instructions look mercifully short and simple. As you say, the lan78xx and its driver seems the most likely culprit, but I don't want to leap to any conclusions.

... and for running test in non-interactive loop a script like this:

#!/bin/bash

for i in $(seq 1 10); do
    logger running test $i
    /usr/sbin/btrfs-auto-snapshot xbiancopy --helper --img /dev/root /srv/backup/test.img
done

That's what I'm running at the moment

mkreisl on 25 Mar 2018

@pelwell @JamesH65

it's very simple to reproduce !!! mount a network share paste a video and read it from the share. You will systematically logout the video after several minutes of reading...
it works also with a copy of a very large file.

Knoppix1 on 25 Mar 2018

👍1

Here some questions for those who can reproduce the issue:
Are you able to reproduce it with Raspbian?
Do you have an external HDD connected via USB during the issue?
Which filesystem is the base on the server?
Are there any modifications to the cmdline.txt or config.txt?

lategoodbye on 25 Mar 2018

@lategoodbye

Me yes I am on Raspbian with firmware basic (4.9)

Again on HDD or internal storage it's the same things.

My cmdline and config

 $ cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=8d333b2b-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait rootdelay=3
pi@MyBilly:~ $ cat /boot/config.txt
# For more options and information see
# http://rpf.io/configtxt
# Some settings may impact device functionality. See link above for details

# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1

# uncomment this if your display has a black border of unused pixels visible
# and your display can output without overscan
#disable_overscan=1

# uncomment the following to adjust overscan. Use positive numbers if console
# goes off screen, and negative if there is too much border
#overscan_left=16
#overscan_right=16
#overscan_top=16
#overscan_bottom=16

# uncomment to force a console size. By default it will be display's size minus
# overscan.
#framebuffer_width=1280
#framebuffer_height=720

# uncomment if hdmi display is not detected and composite is being output
#hdmi_force_hotplug=1

# uncomment to force a specific HDMI mode (this will force VGA)
#hdmi_group=1
#hdmi_mode=1

# uncomment to force a HDMI mode rather than DVI. This can make audio work in
# DMT (computer monitor) modes
#hdmi_drive=2

# uncomment to increase signal to HDMI, if you have interference, blanking, or
# no display
#config_hdmi_boost=4

# uncomment for composite PAL
#sdtv_mode=2

#uncomment to overclock the arm. 700 MHz is the default.
#arm_freq=800

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Uncomment this to enable the lirc-rpi module
#dtoverlay=lirc-rpi

dtoverlay=pi3-disable-bt
dtoverlay=pi3-disable-wifi

# Additional overlays and parameters are documented /boot/overlays/README

# Enable audio (loads snd_bcm2835)
dtparam=audio=off
`

Knoppix1 on 25 Mar 2018

Here some questions for those who can reproduce the issue:
Are you able to reproduce it with Raspbian?

Yes, already posted

Do you have an external HDD connected via USB during the issue?

No/Yes, does not matter

Which filesystem is the base on the server?

Doesn't matter, cifs, nfs4 or sshfs share mounted on server

Are there any modifications to the cmdline.txt or config.txt?

No

mkreisl on 25 Mar 2018

Here some questions for those who can reproduce the issue:
Are you able to reproduce it with Raspbian?

Yes... Using the latest Stretch.

Do you have an external HDD connected via USB during the issue?

Yes, both direct and through a hub.

Which filesystem is the base on the server?

Filesystem on HD = NTFS. File System on Connecting PC = Windows 10 + Android.

Are there any modifications to the cmdline.txt or config.txt?

Nope, neither... Tried a few different MTU's with no effect.

pjgpetecodes on 25 Mar 2018

Sorry, i'm not able to reproduce the issue with Raspbian (kernel 4.14.27) and my RPI 3 B+. Yesterday i transfered the uncompressed Raspbian image (~ 4 GB) via sftp and today via NFS. No issues

lategoodbye on 25 Mar 2018

And that is the sticking point. But tomorrow we'll set our team of highly trained engineers on the problem and hopefully make some progress.

pelwell on 25 Mar 2018

👍3

For reference, I'm using Samba... Exact card works a treat in a Pi 3 (Not B+)... Move to a B+, craps out after a GB / 1 minute or so.

pjgpetecodes on 25 Mar 2018

Yes, Samba or cifs does often seem to be a common element.

pelwell on 25 Mar 2018

👍3

@mkreisl (nachteule?) Am I right in thinking the current Pi 2/3 XBian release from the Downloads page should run on the 3B+? Version numbers and dates are conspicuously absent.

pelwell on 25 Mar 2018

@mkreisl (nachteule?) Am I right in thinking the current Pi 2/3 XBian release from the Downloads page should run on the 3B+? Version numbers and dates are conspicuously absent.

Yes the latest release supports Pi3B+. That version comes with kernel 4.9.87+, if kernel 4.14.29+ is desired, staging repo has to be enabled

mkreisl on 25 Mar 2018

how to patch that by staying on the official firmware?
Thank for reply

Knoppix1 on 28 Mar 2018

The latest rpi-update kernel has some potential fixes for kernel panics in networking. Can you test?

popcornmix on 28 Mar 2018

I've stumbled into this.
Will duplicate here what I posted on the forums.

Equipment:
pi3b+
ubuntu server, i5 skylake, intel gigabit NIC
managed L2 switch DLINK DGS-1210-10

Software:
own GNU/Linux homemade OS
kernel rpi-4.14.y @ https://github.com/raspberrypi/linux/commit/9d2ad143e40c38d34be86578840499a976c0a5b0
latest firmware at the moment of writing

Symptoms:
RX on pi3 very poor
TX good
iperf3 -c rpi3plus -t 60 reports hundreds of drops, very poor speed (sorry I don't have bad numbers handy)

What I have tried:

dtparam=eee=off : no change as eee was already disabled on all switch ports
force link to 100Mbit FD on the switch: all good , at 100 mbit speed no drops, no retrs in iperf3, nfs file copy speed is around 95mbit/s
remove forced link speed, enable flow control on 3b+'s port on the switch - GOOD,

iperf3 -c rpi3plus -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  1.69 GBytes   242 Mbits/sec    8             sender
[  4]   0.00-60.00  sec  1.69 GBytes   242 Mbits/sec                  receiver

dd if=/path/to/large/file/on/nfs4/mount/file.iso of=/dev/null iflag=direct bs=4M status=progress
5028970496 bytes (5.0 GB, 4.7 GiB) copied, 172 s, 29.2 MB/s
1200+1 records in
1200+1 records out
5035036672 bytes (5.0 GB, 4.7 GiB) copied, 172.307 s, 29.2 MB/s



md5-11642fa57d732b90ba82def1a215075a



ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on
RX negotiated:  on
TX negotiated:  on

So flow control on the switch's definitively fixes the situation for me.
Hope this provides some clues.

asavah on 29 Mar 2018

The corruption fix 1ad1d52e6cb6a9fcee5d3fb08258b417ffda37fd looks very interesting isn't this necessary for rpi-4.14, too?

lategoodbye on 29 Mar 2018

@asavah

ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on
RX negotiated:  on
TX negotiated:  on

So flow control on the switch's definitively fixes the situation for me.
Hope this provides some clues.

If you don't enable flow control on your switch does the output of ethtool change?
On my baby unmanaged Netgear switch pause parameters report auto-negotiate and on by default, so I'm just trying to understand is it something that we'd normally expect to be on.

6by9 on 29 Mar 2018

@6by9
flow control turned off on the switch:

root@rpi3plus:~# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:     on
TX:     on
RX negotiated:  off
TX negotiated:  off

EDIT: a little more testing with FC off on the switch:

ethtool -A eth0 tx off rx off

BAD, while copying file with dd to /dev/null (see example above) the speed is bad, iftop reports peaks of 2Mb/s
dd itself is irresponsive to ctrl+c, dmesg is spammed with

[39643.040487] nfs: server nas2 not responding, still trying
[39643.040495] nfs: server nas2 not responding, still trying

Enabled FC on the switch and on the pi3b+ NIC and all is good again.

Edit2: for completness sake - nothing is plugged into the USB ports of the pi, wifi is disable via overlay.
PSU is original RPF one.

asavah on 29 Mar 2018

@asavah Thanks, that helps a lot.
The baby unmanaged switches we do most of our testing through allow flow control by default. Managed switches appear not to. Switching to sending the data over the main managed switches we see the issue.

So the question is why is disabling 802.3x flow control the default in managed switches.
I'm not an expert in networking, but it sounds like without it the Pi will have issues with no real way around it (I'm assuming that if the Pi sends pause frames when the switch has disabled 802.3x then they'll be ignored. @JamesH65 is looking to see if we can send them if not negotiated).

6by9 on 29 Mar 2018

👍1

No answers yet on whether we can override settings in the driver yet, but reading around the subject https://www.smallnetbuilder.com/lanwan/lanwan-features/30212-when-flow-control-is-not-a-good-thing presents one case for not wanting flow control. It's the same case as described in https://en.wikipedia.org/wiki/IEEE_802.3x#Issues, both of which are discussing the switch sending pause frames due to filling internal FIFOs, neither are discussing the consumer wishing to pause transmissions.

6by9 on 29 Mar 2018

@lategoodbye I've cherry picked the corruption fix commit to newer branches.
I'm guessing that commit won't be related to flow control issues (but could avoid other issues).

popcornmix on 29 Mar 2018

A further discovery in comparing a Realtek r8152 against the lan78xx - jumbo frames appear not to be enabled on the lan78xx, so all frames are 1514 bytes.
Trying to decipher the driver for what needs to be poked to enable it.....

6by9 on 29 Mar 2018

So the question is why is disabling 802.3x flow control the default in managed switches.

Depends on brand.
E.g. Juniper (popular in data centers) does have it on by default.

No answers yet on whether we can override settings in the driver yet

Try:

sudo ethtool --pause eth0 autoneg off
sudo ethrool -r eth0

But wouldn't count on your switch accepting the pause frames, if not enabled.

maxnet on 29 Mar 2018

@popcornmix I tried doing the same transfers after updating with rpi-update but I still have the same issue, except now after failing it lasts longer before failing again. In the past it would fail again after a second or so after failing the first time. This is what filezilla said now:

File transfer failed after transferring 4.342.120.448 bytes in 272 seconds

File transfer failed after transferring 3.275.915.264 bytes in 196 seconds

The problem however is that now after failing that second time filezilla can't resume the file transfer and gets disconnected from the server, which I haven't seen before.

ltctceplrm on 29 Mar 2018

👍1

The same -_-

Knoppix1 on 30 Mar 2018

Looking at the only available page about LAN7515 on microchip.com, I wonder if Temp. Range Max. of 70 °C is not an issue here. Can someone measure temperature of the LAN chip while this problem occurs?

Compared to LAN9514, the new LAN7515 has:

Supports 802.3az Engergy Efficient Efthernet
Supports jumbo frames to 9KB
Provides multiple automatic power saving modes

How can we use jumbo frames?

In a similar chip LAN7850 I found the following note:

In order to avoid frame drops in the RX FIFO when using jumbo frames with flow control the
maximum frame size should be restricted to 4 KB or less.

fieryo on 4 Apr 2018

We have requested information on enabling jumbo frames.

FYI my production 3B+ has compiled its own kernel 6 times overnight into its NFS-mounted filing system without a hitch.

pelwell on 4 Apr 2018

Jumbo frames do work for me.
They aren't that useful for the average user though, given that they in practice only work in your own network, your switch needs to have them enabled, and you need to set the MTU higher (ifconfig eth0 mtu <new mtu>) on all computers you communicate with, not just on the Pi.

Performance gain is pretty minimal.
Standard MTU 1500:

pi@raspberrypi:~ $ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 53742
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec   263 MBytes   219 Mbits/sec
[  5] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 53744
[  5]  0.0-10.1 sec   263 MBytes   219 Mbits/sec

MTU 4000:

pi@raspberrypi:~ $ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 53758
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   271 MBytes   227 Mbits/sec
[  5] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 53760
[  5]  0.0-10.0 sec   272 MBytes   227 Mbits/sec

Can verify by looking at ethtool -S eth0 that jumbo frames are received:

RX Greater 1518 Byte Frames: 279374

==

Two other notes:

Whether there are performance problem when flow control is disabled seems to depend on other factors as well.
I do see the problem on a HP 1810G, but on a TP-link TL-SG108E V3 it performs fine even with flow control disabled (and yes, verified it was actually off with ethtool)
I do not see the connection stall problem described in this thread very often, but when I do there are other problems with the system as well.
E.g. had the entire system hang when transferring a lot once.
And it seems there either was memory corruption or the system clock being screwed up another time.
Told my normal computer to transfer data for an hour to the Pi. And looking at the low transfer speed I think it stalled half way:

max@lynx:~$ iperf -c raspberrypi.local -t 3600
------------------------------------------------------------
Client connecting to raspberrypi.local, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.178.221 port 38764 connected with 192.168.178.26 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-3600.1 sec  49.4 GBytes   118 Mbits/sec

The Pi however thought it had been receiving data for several days, looking at the number of seconds in "interval".

pi@raspberrypi:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------

[  4] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 38764
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-241884.4 sec  49.4 GBytes  1.76 Mbits/sec

That looks like a driver/software problem rather than something that can be blamed on the LAN hardware to me.

maxnet on 4 Apr 2018

It's some funny interaction between the LAN chip, driver, and other LAN infrastructure. Lots of variables that need to be narrowed down.
The most-likely looking conclusion is that flow control is going to be mandatory for reliable gigabit operation, and there doesn't appear to be a guaranteed way for the Pi to force it.

Jumbo frames appear to be a red-herring. The Realtek RTL8153 based adapter we had tested with (uses r8152 driver) seems to have a some level of network offload that combines incoming TCP frames together before delivering them up the stack. Wireshark running on the Pi therefore sees large packets, but running on a mirrored port of the switch confirms that at a wire level they are the normal 1516 bytes in length.
Testing with an AX88179 based adapter gives very similar results to the onboard LAN chip.

As it happens I happened to have a TP-link TL-SG108E V3 at home, so that is what we are currently testing with - it is showing a significant difference with flow control on or off. We'll confirm results again later on the corporate Netgears.

6by9 on 4 Apr 2018

As it happens I happened to have a TP-link TL-SG108E V3 at home, so that is what we are currently testing with - it is showing a significant difference with flow control on or off.

Hmm, you are right.
I made the mistake of only testing inside the network.
And internally the speed is ok with that switch, even without flow control. Presumably because there is no lag and the other computer is able to resend the dropped packets fast.

pi@raspberrypi:~$ ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:     on
TX:     on
RX negotiated:  off
TX negotiated:  off
pi@raspberrypi:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.178.26 port 5001 connected with 192.168.178.221 port 55952
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   267 MBytes   224 Mbits/sec

But when testing against a remote server download speed is indeed bad when flow control is off:

pi@raspberrypi:~$ speedtest-cli
Retrieving speedtest.net configuration...
Testing from Ziggo (*.*.*.*)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Qweb | Full-Service Hosting (Alblasserdam) [34.63 km]: 20.12 ms
Testing download speed................................................................................
Download: 62.73 Mbit/s
Testing upload speed....................................................................................................
Upload: 25.09 Mbit/s

vs with flow control:

pi@raspberrypi:~$ speedtest-cli
Retrieving speedtest.net configuration...
Testing from Ziggo (*.*.*.*)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by KPN B.V. (Den Haag) [2.03 km]: 16.444 ms
Testing download speed................................................................................
Download: 198.96 Mbit/s
Testing upload speed....................................................................................................
Upload: 25.79 Mbit/s

maxnet on 4 Apr 2018

forgive my ignorance. I have a problem that is described here. I do not know about programming, coding, scripts. I set RPi3B + thanks to guides. Can I somehow fix the error? Do I still have to wait for a fix? RPi3b worked great, newer version, unfortunately not.

frrancuz on 28 Apr 2018

👍1

There still appears to be no fix ATM. Workaround for me it to transfer some large file (>1GB) over SFTP - it fails few times but then Samba transfers works again for some time. So no need to reboot for me anymore. When I need to update my Arch Linux from my local repo on PI over HTTP , I have to also run paralel SFTP transfer, othewise it would not work, it still fails few times, but a lot less.

Otw-cz on 28 Apr 2018

👍1

I have to say https://github.com/raspberrypi/linux/commit/354898f0a0f4a62ab950e826c8c598fc78378b06 does improve my kodi experience. Before, it would stall playback 1-n times every video playback.

stevenhoving on 5 May 2018

I also got a similar issue at 3B+ when I use Plex Media Server and when use SCP. It stalls randomly and see a lot of stuck packets at Send-Q when I type netstat -an.

However, when I replace the Ethernet cable from CAT5e to CAT6, it dramatically resolved in my environment. I never encounter stall when SCP and Plex Media Server as same as my old 3B.

hidetosaito on 12 May 2018

👍1

Oh nice !
Exactly on the same situation with plex ! my cable is cat5e
I will test in the day.

Knoppix1 on 13 May 2018

Just for the record I was facing the same problem with OSMC on 3B+. Originally I thought it was related to NTFS on the external SSD, but in my last test I was also able to reproduce it on ext4 partition.

The original story is here: https://discourse.osmc.tv/t/sftp-stalls-when-transferring-files-from-ntfs-drive/72526

TL;DR version: When transferring files from the SSD which is connected to RPi via USB/SATA bridge, by using SFTP the transmission stalls after a while.

What was not mentioned in this thread, but what I observed with Filezilla, which automatically tries to reconnect after a timeout, was that after the reconnection the transfer continued again for a fraction of second, before stalling again. Like if the killing the connection and creating a new one released some resources, before it stalled again.

risa2000 on 15 May 2018

Hi guys,

Do you have any solution/workaround for this? I think many people are facing this issue, but because this issue is very hard to identify correctly, people are blaming all sorts of things (samba, nfs, NTFS, sftp, etc). Do you think I should return the product to the seller or wait for a solution?

Thank you.

rfpaiva on 4 Jun 2018

Have you checked to see if you have flow control on in your network?

Have you done an rpi-update to ensure you have the latest kernel and patches?

JamesH65 on 4 Jun 2018

👍1

Flow control is not present on my network, checked several times, and also I tested with different routers.
I ran rpi-update yesterday and at a first glance the error is no longer there. I'll run some more heavy tests today and get back to you.
I did run rpi-update some weeks ago and it didn't solve the issue, so I assume a new version was released in the meantime.

Thank you again.

rfpaiva on 5 Jun 2018

Maybe this is related to the connection stalls:
https://marc.info/?l=linux-arm-kernel&m=152828525609642&w=2

lategoodbye on 6 Jun 2018

I've been looking in to this again, transferring large files from a USB device on the Pi to Windows laptop via Samba. No issues seen so far with the latest kernel (4.14.44) after 10GB. I'll keep running the tests.

@lategoodbye That's an interesting link. Not quite sure whether its relevant, but certainly seems possible.

JamesH65 on 6 Jun 2018

It's better inn 4.14.44...but after 1 day I have seem problem. With plex and samba share.
I just update 4.14.48 i am waiting to see..

Knoppix1 on 7 Jun 2018

I'm curious if disabling swap have any influence on this issue:
sudo swapoff -a

lategoodbye on 7 Jun 2018

I'm curious if disabling swap have any influence on this issue

Also have seen the issue without swap.
(Me and other Berryboot users do not swap, as the dphys-swapfile stuff doesn't work on layered aufs file systems)

maxnet on 7 Jun 2018

I'm also having this problem on my Pi3b+ I picked up 2 weeks ago from CanaKit.

I updated to the latest kernel (4.14.48) today and still having this issue. Large transfers from an HDD attached to the Pi3b+ stall with an error after a few minutes. Also when I'm watching video files stored on an HDD attached to the pi3b+ from my laptop or another pi over samba the video locks up, which is how I first discovered the issue.

joickle on 7 Jun 2018

Disabling swap doesn't fix the transfer issue. Nor does todays rpi-update 4.14.48. (Tested over HTTP and SAMBA)

Otw-cz on 7 Jun 2018

@joickle Bit unclear, are you seeing the issue without using the ethernet? Just playing back from an attached HD on the Pi itself? Are there any error messages in syslog? Or is it only via Samba?

As above, I've been trying this and not been able to replicate, so at a bit of a loss how to proceed.

JamesH65 on 8 Jun 2018

After a few more tests, keeping the PI on for a good period of time, I noticed that the file transfers are more stable, but the problem is definitely there. File transfer fails, connection is lost. In some cases I cannot even start a new transfer, the process keeps trying to copy but there is no network activity, like a deadlock situation. The ping command never fails. After rebooting, file transfers work again, and then fail after a while.

rfpaiva on 8 Jun 2018

@JamesH65

As above, I've been trying this and not been able to replicate, so at a bit of a loss how to proceed.

As long you are just running simple copy procedures to reproduce the issue you will never be able to reproduce this issue, there you have to work a little harder

Good samples you can find here

mkreisl on 8 Jun 2018

How do you know? Any thoughts on WHY simple copies work but 'something else' doesn't? Simply saying never in bold is about as much use as a chocolate teapot without some sort of reasoning behind it.

I've also tried streaming, and MIGHT have reproduced something once, but only once. No log messages, machine was pingable, other samba shares were still working. Only evidence was the link appeared to have been dropped.

JamesH65 on 8 Jun 2018

@JamesH65 I am running the pi3b+ as a NAS, and I'm using samba over ethernet. The issue appears when transferring very large video files off of the pi3b+ onto my laptop, or when I'm streaming video files from the pi3b+.

If you're just turning on the pi and testing it will probably succeed, like others have mentioned it's fine for an hour or 2 after a reboot, then it starts happening again.

joickle on 8 Jun 2018

@JamesH65 Exactly :-) No log/dmesg messages, no ethtool -S errors, ping/machine works, but the connection stalls and eventually gets terminated, usually cannot be resumed immediately, it takes few retries but it's unstable, or reboot or "SFTP workaround described earlier" after which it works again for some time. Simply copying might be so fast, that this error doesn't occure. It happens always when streaming, but it might work an hour or more before it gets stucked and it's random. (I'm running PI 24/7 as fileserver and such...)

Otw-cz on 8 Jun 2018

I'm right that the issue requires a managed switch to reproduce?

lategoodbye on 8 Jun 2018

I'm connected via ethernet to an Asus RT-N66U router, pretty sure the issue is not the router as it worked perfectly on a pi2 as well as an ubuntu server.

joickle on 8 Jun 2018

@lategoodbye FWIW, I did the A/B testing using RPi 3b and RPi 3b+ in the exactly same environment (using TP-LINK TP-WR1043NDv2 running OpenWRT/LEDE in the middle, RPi on one side, desktop with Win 7 on the other) and running the same OS (latest OSMC using the same SD card). RPi 3b did not exhibit the problem (but was of course running at the half speed of RPi 3b+), RPi 3b+ did.

risa2000 on 8 Jun 2018

@JamesH65

How do you know? Any thoughts on WHY simple copies work but 'something else' doesn't? Simply saying never in bold is about as much use as a chocolate teapot without some sort of reasoning behind it.

Because I know it. When I was starting with Pi3B+, such simple stupid file transfers always worked, the more data goes bidirectional over usb, the higher the probability that the problem will occur

I spent a long time on this f...... network issue, so I know what I'm reading about. I offered my help some weeks ago, sent a PM on forum to you, on the other hand I was asking for hardware support for running more tests, but all I got was an answer not related to my question from your side. That's why the topic was done for me. And it seems, you and all other guys @ RPF still have absolutely no idea what's going on there and how to solve it.

I've also tried streaming, and MIGHT have reproduced something once, but only once. No log messages, machine was pingable, other samba shares were still working. Only evidence was the link appeared to have been dropped.

Yep, as already said, you have to build a more complex scenario.

mkreisl on 8 Jun 2018

@mkreisel @JamesH65 I guess we might need to be more specific. I stumbled upon this issue when doing SFTP transfers from NTFS partition on external disk connected to RPi 3b+. I would call those transfers "simple" in my book, but they stalled anyway. I did have much more success with ext4 partition on the same drive, but eventually I managed to stall SFTP transfer from this one too (as described here: https://discourse.osmc.tv/t/sftp-stalls-when-transferring-files-from-ntfs-drive/72526).

I did try to debug it by turnning on debugging on NTFS and on SFTP but neither show anything extraordinary, nor dmesg or journalctl. At the end the connection simply stalls. As I also noted above, restarting the connection on SFTP level (automatic reconnection in FileZilla) managed to make few more bytes through before stalling again.

If you are looking into ways to reproduce this problem, I guess my setup is as simple as it gets. Just attach en external drive (I am using Samsung EVO 850), format it to NTFS and run SFTP transfers from the RPi. Use large files for the test of several GB. After several transfers and some time after boot (seems to be also related to the uptime) you should hit it.

risa2000 on 8 Jun 2018

@joickle @risa2000 You didn't understand my question, i never said that your switch/router is broken. Replacing the RPI 3 B+ with a RPI 3B doesn't provide new information because they use different network driver and the chance is very high the issue is in the driver (or at least connected). As long as we don't have a defined test setup, fixing this issue is impossible.

lategoodbye on 8 Jun 2018

@mkreisl You may say that it needs a more complex scenario, but others aren't.
Original poster is solely transmit from the Pi, and others say they are just streaming from the Pi.

You've also contradicted yourself in initially reporting that all was good on WLAN, and then say it isn't. Muddies the waters somewhat :-/ Are we looking at a Samba issue (except some people say NFS too), a LAN78xx issue, or a dodgy Pi?
Getting a cohesive picture is not easy, which makes getting to the bottom of the issue significantly harder.

A couple of questions for anyone having issues, mainly to try and narrow down the conditions required. (Random "me toos" without any details aren't terribly helpful).

does switching down to 100baseT work? ethtool -s eth0 autoneg off ethtool -s eth0 speed 100 (asavah had reported that fixed the issues for him, mskreisl reports that it doesn't)
is flow control is enabled if running as 1000baseT. ethtool -a eth0 should report "on" in all fields.
can people please confirm that they are NOT running VLANs.
IPv4 or IPv6? It affects the adapter offload (yes mskreisl I have seen that you've reported your case seems to work on IPv6 and not IPv4. It would be useful for others to confirm that on their setups. IPv4 is our normal setup and we're not reproducing the problem)
is it fixed if offloading is disabled? Does it need to be for both tx and rx? sudo ethtool -K eth0 rx off and sudo ethtool -K eth0 tx off (sudo ethtool -k eth0 to view the status).
where individuals have had issues on one 3B+ board, if possible then please test on another 3B+. There is the potential for it to be a particular board to be faulty/intermittent.
where people are transferring data from the Pi, what is the source device? SDcard, USB HDD, USB SSD, or something else?

This isn't being ignored, but not being able to reproduce it hinders progress.

6by9 on 8 Jun 2018

👍1

If it helps my pi3b+ is running the latest raspbian stretch lite, with 2 USB HDD's connected via external enclosures, each have their own power supply and running NTFS partitions. I am using samba to make the drives available over the network. The only time I have this issue is when I transfer multiple very large video files (1GB+ each) from the pi on to my windows pc. It also happens when I'm streaming the video files from the pi3b+'s attached USB hard disks, the video randomly locks.

Also it's fine after a reboot and takes at least an hour or so before it appears again.

joickle on 8 Jun 2018

👍1

@6by9 I haven't had time to run deeper tests,
the currrent situation for me (with the kernel built yesterday fro rpi-4.14.y branch) is:

enabling flowcontrol on the managed switch no longer saves the situation (nfs timeouts and etc), I need to retest this properly tho (will do at the weekend if possible), thats why I switched to 100 Mbit
forcing link to 100Mbit at the switch side - all good, just as 3b (no plus), main _heavy_ lan usage is nfs v4, udp ,the other side is skylake 1gbit server

asavah on 8 Jun 2018

On OSMC, we've noticed that these issues are most prevalent when using Samba. The issue is particularly noticeable when using FUSE based filesystems. It looks like Windows clients seem to make two requests: one to get an ETA for transfer and one to perform the operation requested. This causes FUSE based filesystems like exFAT and NTFS to fall over. It seems to be exacerbated > 100Mbps.

@risa2000 I haven't made the latest changes available yet but will do that in the next couple of days.

samnazarko on 8 Jun 2018

found this today will be trying tomorrow to see if it fixes this problem as a work around
run this

ethtool --offload eth0 rx off tx off

Actual changes:
rx-checksumming: off
tx-checksumming: off
tx-checksum-ip-generic: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]

I found this here and it seems to fix this problem
https://www.raspberrypi.org/forums/viewtopic.php?t=209258
I had to revert back to a pi3 because of this for my media server

Bignumbas on 10 Jun 2018

@Bignumbas That's identical to the ethtool -K eth0 rx off and ethtool -K eth0 tx off I'd asked people to try in https://github.com/raspberrypi/linux/issues/2449#issuecomment-395819097. However I'd also requested information on whether it needs to be both directions - any input there?

Checksum offload is one potential candidate for issues, but disabling it without an understanding of where the issue is not a sensible overall solution. For a start it's disabling multiple different offload functions, so how about driving your car only in first gear because you know there's an issue in higher gears but haven't found out which of them is the problem.

6by9 on 10 Jun 2018

@6by9 I tested that, but I wanted to make sure it's working for a bit before I reported back. I also used ethtool --offload eth0 rx off tx off and so far it's been working great with no lock ups or failed transfers. However when I use this command it returns a partial error.

The ethtool -K eth0 rx off return an operation not supported error:

Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported

and ethtool -K eth0 tx off returns:

Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported
Actual changes:
tx-checksumming: off
        tx-checksum-ip-generic: off
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]

So far this is working great, but would love to see a real fix soon.

joickle on 10 Jun 2018

@joickle Please try ONE OF ethtool -K eth0 tx off OR ethtool -K eth0 rx off. Does one of them solve the problem? If so which one?
If possible please then drill down as to which sub-option it is that is causing issues.

Please also confirm if you are running IPv4 or IPv6 - as you can see from the output there are different offloading options between the two.

I'm just wondering if Microchip haven't fixed the issue they had on SMSC95xx - https://github.com/raspberrypi/linux/commit/fe0cd8ca1b82983db24b173bb8518ea646c02d25#diff-c1b6c5470b2ae917eb5e620b19c1aa63. That would be IPv6 only though, which appears to not be the case here.

6by9 on 10 Jun 2018

@6by9 Ok, I will try each one individually and report back. However it appears that the rx one doesn't make any changes as it only return the operation not supported errors.

Also I'm running IPv4, IPv6 is disabled on my router.

Edit: I should also mention that I reverted back to the official 4.14.34 release as well.

joickle on 10 Jun 2018

Thank you.
I don't trust the results from setting things in ethtool - it reports the errors from sub-changes, but not always the changes. The driver does support rx checksum offload.

4.14.34 includes all the changes we've made around VLAN filtering and checksumming, so there shouldn't be any changes missing from that.

I suspect we'll have to be writing a test app that sends out incrementing UDP packets to exercise all values of the checksum and validating that they are all received at the far end.
I don't suppose anyone has a wireshark capture of the network when this goes wrong?

6by9 on 10 Jun 2018

@6by9 it appears that sudo ethtool -K eth0 tx off alone is the golden egg that allows transfers and streams from the pi to complete.

I've never heard of wireshark until now, how may I go about trying to get a capture for you?

joickle on 10 Jun 2018

Many thanks.
We still have the issue of not being able to replicate, so if you can isolate it down to which of the tx offload features is causing the problem then that would be brilliant.
Enable everything again (tx on), and then try each of the following in turn, re-enabling them before moving on:
sudo ethtool -K eth0 tx-checksum-ipv4 off
sudo ethtool -K eth0 tx-checksum-ip-generic off
sudo ethtool -K eth0 tx-tcp-segmentation off
(apologies if I've got these wrong - I've only got a B available)

Hold fire on running wireshark for now, as I'm not sure which end really needs capturing.
For information though: sudo apt install wireshark. Generally I don't worry about non-root users running wireshark. Use sudo wireshark to run it (it is a GUI app). Select the relevant network interface, and it'll start capturing. The issue is likely to be that if you're just generally streaming then it'll consume a large amount of memory/disk, so it may not be practical to do.

6by9 on 10 Jun 2018

dumpcap (part of the tshark package) can be configured to use a ring of fixed-size buffers:

sudo dumpcap -i eth0 -w <file.cap> -b filesize:32768 -b files:128

This will give it 128 files of 32MB each, for a total of 4GB of recording. With a bit of luck the network stall will effectively pause the capture and leave a usable log.

pelwell on 10 Jun 2018

Ah, thanks @pelwell. I knew there must be an option in tcpdump or similar, but wasn't going to hunt now.

Thinking about it, seeing as it's transmit offloading it'll have to be on the receiving device, or other device on the network (issues of switches and needing monitor modes).
That said, retries might show up on particular outgoing packets (immediately before the session gets closed).

6by9 on 10 Jun 2018

It looks like 2 out of the 3 have a positive result, tested twice to make sure.
sudo ethtool -K eth0 tx-checksum-ip-generic off
sudo ethtool -K eth0 tx-tcp-segmentation off

My pi is running headless raspbian lite so I guess I can't do the wireshark capture anyway.

joickle on 10 Jun 2018

Both dumpcap and tshark are command line utilities, so easily run over SSH.

pelwell on 10 Jun 2018

From @joickle 's report of 2 commands having a positive result (which I'm reading "positive" as "not seeing issues") I think we possibly have a smouldering gun in tx-tcp-segmentation (it's not definitive enough to be smoking!).

Testing the commands myself on a 3B+

pi@raspberrypi:~ $ sudo ethtool -K eth0 tx-checksum-ip-generic off
Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported
Actual changes:
tx-checksumming: off
    tx-checksum-ip-generic: off
tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp6-segmentation: off [requested on]

ie disabling the tx-checksum-ip-generic is also disabling segmentation.
Disabling just tx-tcp-segmentation only tweaks the one setting and leaves tx-checksum-ip-generic enabled, which would imply the checksums are OK.

I'll have a readup on exactly what the segmentation offload is doing (theory can be different from practice).

6by9 on 11 Jun 2018

@6by9 I noticed that as well. I left tx-tcp-segmentation off and so far no issues at all since then :)

joickle on 11 Jun 2018

Just for the record, I've done some fairly simply and not particularly finely grained tests, and I could see no performance degradation is either transfer speed or CPU usage when turning these settings on and off, when using the Pi to serve up a Samba share. Used a fast Win10 laptop to connect to the share and copy a 4GB file from Pi to local folder. The data on the Pi was stored on a exfat SD card in a USB adaptor, so system was exercising the USB bus for both Ethernet and the USB device. Approximate average transfer speed was 16.5MB/s (132Mbits/s). Given the USB bus is being used twice (ethernet and USB adapter) we can double that to 264 or so, just over half the theoretical max of the USB (480), which given the usual USB overheads seems to be about right.

JamesH65 on 11 Jun 2018

That's useful information - thanks for the update.

For info tcp segmentation allows the OS to send a big TCP packet to the LAN adapter, and the LAN adapter then breaks it down into the appropriate MTU/MSS size chunks, including filling in the TCP header checksum and sequence number fields of the header. It explains why running a packet capture on the transmit side will often result in seeing impossibly large packets being sent on ethernet interfaces.
The logic also follows that if the adapter isn't doing the checksum calculations then it can't do segmentation, therefore disabling checksumming has to disable segmentation as well.

Trying to identify what is going wrong with tcp segmentation is the next step, but that leaves a large number of potential causes.
Really daft thought: TCP segmentation offload is recomputing the TCP sequence number of the packets (how many bytes so far have been sent over the TCP session). That's a 32 bit values, so will wrap around to 0 at 2^32-1, or 4GB-1. Does the offload engine correctly handle that wrap around? If not then things would die around the 4GB mark.
If the transfer is running at 250Mbit/s (possibly a little optimistic), then sending 4GB of raw data would take 137seconds, and people are saying failures after a few minutes.
Tests coming up....

6by9 on 11 Jun 2018

@JamesH65
Pi3B+ (Stretch), external HDD (exFAT), Samba,, all with latest Updates. 1GB LAN, Win10-PC.

Copy from Pi to PC runs ok (mostly) with 1GB a minute (Filesize about 1,5GB).
Copy from PC to Pi works only with 100MB - mostly crashing.
Copy from Pi to PC crashes, when starting a PC to Pi Copy at same.

Copy HDD to internal SD (without LAN) runs with 1GB - SD to HDD again slow with 100MB.
Internal Copy in both directions - allway crashing.

NorbertM21 on 11 Jun 2018

@NorbertM21 Did you do the following on the Pi command line prior to running tests?

sudo ethtool -K eth0 tx-tcp-segmentation off

JamesH65 on 11 Jun 2018

@NorbertM21 Also, what do you mean by an internal copy?

Also looks like you might have flow control turned off on your network

JamesH65 on 11 Jun 2018

System-SD to USB-HDD, without transfer through Ethernet to a PC or Mac (which causes the same Problems).

NorbertM21 on 11 Jun 2018

@NorbertM21 So you get crashes even when not copying over the network, just to a USB attached HDD? If so, I think you are seeing a different issue. The issue being described here is when using the network.

JamesH65 on 11 Jun 2018

I have those problems also when copying Pi to PC and PC to Pi.
The internal copy was only for testing.

On my setup, i find no option to set FlowControl - Fritz!Box 7270 100mBit LAN and normal passive GB-Switches. The PC-Networkadapter has FlowControl enabled.

NorbertM21 on 11 Jun 2018

Does the offload engine correctly handle that wrap around? If not then things would die around the 4GB mark.

Partial vindication of the stack. The TCP initial sequence number is chosen at random, so statistically it could be chosen as (2^32)-1 and would immediately fall over. As that isn't generally seen then it can't be totally fatally flawed. Investigations continue.

6by9 on 11 Jun 2018

@6by9 for me when I was testing those settings yesterday, once it started to fail it would immediately fail on retries thereafter until turning off tcp segmentation. Turning off tcp segmentation allows the transfers to go through without the need for a reboot.

joickle on 11 Jun 2018

@joickle Curious as I was expecting any segmentation issue to be on a per TCP session basis, so once that session was cleared then all would be good.

On the one oddness @JamesH65 and I caught, the Samba TCP session had been terminated according to the Pi (not present in netstat -na), but Windows was still trying to use it. Switch to another device and make a Samba, SSH, or other connection in and the Pi was still functioning. Basically Windows needed to be kicked to drop the dead Samba session and all was right with the world again.
I wonder if yours is a similar problem in that Windows needs to be kicked appropriately.

I'm now curious as to whether disabling tcp segmentation offloading works for @asavah at 1000baseT. Can you also confirm whether you're running NFS over TCP or UDP? If UDP then that's just flumouxed me as segmentation is only going to affect TCP.

6by9 on 11 Jun 2018

@6by9
Sorry by misleading you in an earlier post, https://github.com/raspberrypi/linux/issues/2449#issuecomment-395820912
I was indeed using NFSv4 _tcp_ when I noticed the issue even with flowcontrol enabled.
I was running tests and got interrupted so I forgot what I was doing and in what order,
so I completely forgot that I was playing with nfs options in fstab too.

Rebooted the system between every test
current nfs options on client as reported by mount

nas2:/media/nas on /media/nas type nfs4 (rw,noatime,nodiratime,vers=4.2,rsize=32768,wsize=32768,namlen=255,hard,nocto,proto=tcp,timeo=14,retrans=2,sec=sys,clientaddr=192.168.36.111,local_lock=none,addr=192.168.36.5)

GOOD
1Gbit
Flowcontrol "on" on the switch
tcp segmentation offload off
nfs v4 _tcp_

 uname -a
Linux rpi3 4.14.48-v7 #1 SMP Sun Jun 10 17:15:22 EEST 2018 armv7l GNU/Linux

root@rpi3:~# ethtool -K eth0 tx-tcp-segmentation off
root@rpi3:~# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on
RX negotiated:  on
TX negotiated:  on

root@rpi3:/media/nas/public/soft/os# dd if=some_pesky_os.iso of=/dev/null iflag=direct bs=1M status=progress
4336910336 bytes (4.3 GB, 4.0 GiB) copied, 154 s, 28.2 MB/s
4159+1 records in
4159+1 records out
4362014720 bytes (4.4 GB, 4.1 GiB) copied, 154.879 s, 28.2 MB/s

BAD

1Gbit
Flowcontrol "off" on the switch
tcp segmentation offload off
nfs v4 _tcp_

root@rpi3:~# ethtool -K eth0 tx-tcp-segmentation off
root@rpi3:~# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on
RX negotiated:  off
TX negotiated:  off

root@rpi3:/media/nas/public/soft/os# dd if=some_pesky_os.iso of=/dev/null iflag=direct bs=1M status=progress
22020096 bytes (22 MB, 21 MiB) copied, 58 s, 378 kB/s^C^C
22+0 records in
21+0 records out
22020096 bytes (22 MB, 21 MiB) copied, 72.8616 s, 302 kB/s

I will try to use the pi as I normally do with gbit/fc on/tcp-segmentation off and report if something changes after some time.

asavah on 11 Jun 2018

@asavah Thanks for the confirmation - NFS TCP I can cope with failing and means all observations are consistent.

Flow control off will be bad always. At some point we ought to look at driver changes to back off to 100Mbit if there is no flow control, but that's a slightly lower priority.
Thanks for trying out gbit/fc on/tcp-segmentation off - people being prepared to try things is the only way we're likely to get to the bottom of this.

6by9 on 11 Jun 2018

At some point we ought to look at driver changes to back off to 100Mbit if there is no flow control, but that's a slightly lower priority.

I had a quick go one Friday afternoon with no success, but it is on the list to be revisited.

pelwell on 11 Jun 2018

What about the @hidetosaito report on May 12, that changing CAT 5e to CAT 6 cable resolved stalled scp. I just can find CAT 6 cable in local shops.

However, when I replace the Ethernet cable from CAT5e to CAT6, it dramatically resolved in my environment. I never encounter stall when SCP and Plex Media Server as same as my old 3B.

Also, test iperf3 downloading to Pi3B+. Currently with CAT5e from Pi to a Mac (no router), downloading to Pi is 100Mbps slower than uploading (Retr shows 0). In addition, I noticed that download speed decreased with about 8 Mbps in the latest Raspbian (4.14 - 2018-04-18) compared to the March release (4.9.80-v7+).

fieryo on 12 Jun 2018

I tested with a category 6 cable just after @hidetosaito solution. But that did not solve anything: /. it's crazy since 3 months the problem is there and there is no solution to this problem! I tested everything and I remain convinced that it is in the management of the queue of the driver. At every network crash there is lot of Send-Q (netstat -n)
I went back to my raspberry 2b ...

Knoppix1 on 12 Jun 2018

@JamesH65
I had a deeper look to my Network-Switches (Zyxel GS-108S v2) and my Realtek Network-Adapter: FlowControl is ON

NorbertM21 on 12 Jun 2018

@fiery TBH I don't believe that switching from CAT5e to CAT6 will have any effect for the issue being discussed here (file transfers stalling). Why would a cable be able to tell that a big transfer is in progress versus small transfers? If you have a dodgy CAT5e cable then I can believe that swapping it out with a good CAT6 cable (or a good CAT5e cable) would fix issues being observed.

@Knoppix1 If you're that convinced that's it a queue management issue then feel free to dig into the kernel code and fix it! It's obviously a trivial issue for you to resolve in a couple of hours.

In the real world we seem to be narrowing down the issue despite still not being able to reproduce it here, let alone having a guaranteed failure test case. I don't know whether that means it is hardware dependent or requires a specific set of config options.

A polite request though - please keep to the issue at hand of file transfers stalling, not mini rants, nor throwing in other random observations.

6by9 on 12 Jun 2018

@6by9 when you wrote:

In the real world we seem to be narrowing down the issue despite still not being able to reproduce it here, let alone having a guaranteed failure test case.

Did you mean you were not able to reproduce on your particular setup, or you were not able to reproduce it on any setup (e.g. mine mentioned in my post above)?

I guess my setup is as simple as it gets. Just attach en external drive (I am using Samsung EVO 850), format it to NTFS and run SFTP transfers from the RPi. Use large files for the test of several GB. After several transfers and some time after boot (seems to be also related to the uptime) you should hit it.

risa2000 on 12 Jun 2018

Did you mean you were not able to reproduce on your particular setup, or you were not able to reproduce it on any setup (e.g. mine mentioned in my post above)?

There are at least 3 of us who have been doing investigations on this at Pi Towers. None of the setups that any of us have exhibited the problem, even when doing exactly the same as various people describe. We have a mixture of devices on stupidly short network cables to local unmanaged switches, and 40m+ back to the corporate switches. Tests have been run to Windows boxes, Ubuntu boxes, Ubuntu in VMs, and to other Pis. There are so many permutations that it is the proverbial needle in the haystack.

I guess my setup is as simple as it gets. Just attach en external drive (I am using Samsung EVO 850), format it to NTFS and run SFTP transfers from the RPi. Use large files for the test of several GB. After several transfers and some time after boot (seems to be also related to the uptime) you should hit it.

James tried that. Worked fine :-/
If it is uptime related then debugging it is going to be even slower, but will be investigated.

6by9 on 12 Jun 2018

@6by9 Thanks for the explanation. I was asking also because I noticed there has been a lot of focus lately in the thread on the network configuration. My (albeit anecdotal) experience was suggesting there might be also some dependency on other system (lack of) resources. I did not do anything like NFS or Samba share from RPi, only SFTP using native SFTP server, but I noticed when trying different file systems that:

1) I did not hit the issue when using SD card as the SFTP transfer source (it could be because I did not give it enough stress, but in my several attempts it worked fine).

2) I managed to hit the issue when the transfer source was ext4 partition on external SSD connected by ASMedia ASMT1051 USB<>SATA bridge (Samsung EVO 850). While I hit the problem there it took much more attempts/transfers, and I probably hit it just by luck.

3) I was fairly consistently hitting the problem when the external SSD used NTFS partiton.
I wrote fairly, because sometimes right after the reboot, it seemed to be working fine, but after a while (10-30 minutes) it started happening consistently. Sometimes I was able to hit it right away. Must have been related to some services I was running.

I was tempted to conclude that not only network (or maybe not the network at all), but some other resource in OS gets starved, on either HW or very low software level. As I was running the transfers I noticed that transfer speeds were ~20 MiB/s so pretty much maxing the native interface speeds on RPi 3b+. The other thought was that maybe due to different mechanism of mounting as NTFS was mounted using FUSE, compared to either ext4 or native SD card file system using native mount, I thought maybe the additional context switches (if there are any) needed for NTFS might have added an additional strain on the system.

As most of the folks hitting this problem are also reporting running some kind of external drive setup, with different ways of exposing those drives (NFS, samba) I wonder if the issue can really be only isolated to network stack problem, or we need to consider the other circumstances as well, in particular USB traffic from/to the external drive. My experience with SD card, ext4 and NTFS suggests so.

risa2000 on 12 Jun 2018

@6by9 So I attempted to run dumpcap last night as suggested by @pelwell to try to get a log for you guys, however it doesn't seem to be working...
sudo dumpcap -i eth0 -w <file.cap> -b filesize:32768 -b files:128
At first the command would return a no such file or directory for the outfile, but once I got past that it just spit out an empty file and seems to exit right away. Not sure what's going on there, maybe I'm missing something? This app is completely new to me lol.

joickle on 12 Jun 2018

I can't test it myself right now, but I suggest you try running it interactively to see if you get any errors:

sudo dumpcap -i eth0

pelwell on 12 Jun 2018

@pelwell No errors with that command, it created the log /tmp/wireshark_eth0_20180612165321_nsfQXg.pcapng and is currently counting the packets.

joickle on 12 Jun 2018

I have a log file, however it is very large (96MB) it was running on the pi when an smb transfer from the pi to windows failed. There does appear to be some errors in there but would need a more experienced eye than mine to make any sense out of it.

joickle on 12 Jun 2018

I was working with a 7GB capture the other day - we'll cope with 96MB. Can you upload it somewhere - DropBox, Google Drive etc. - and post s link?

pelwell on 12 Jun 2018

Here you go https://www.dropbox.com/s/rj9edqkhzyzeo37/wireshark_eth0_20180612171150_8e8s0o.pcapng?dl=0

joickle on 12 Jun 2018

Thanks @joickle. I'll pick through the dump.

It may be nothing, but can i suggest you do some checking for malware on your network. In the dump there is an odd TCP session to/from 192.168.1.2 and 192.168.1.1 on port 420. netstat -npa | less on your Pi should list the owning process of each connection. I'd expect to see a line in the first section of the output

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
...
tcp        0    200 192.168.1.2:420        192.168.1.1:xxx       ESTABLISHED <PID>/<program>

Port 420 is nominally assigned for SMPTE timecode data (pretty uncommon), but is also linked to various malware programs https://www.speedguide.net/port.php?port=420. It's also possible that it is an app of your own doing perfectly innocent things, but I thought I'd flag it.

6by9 on 13 Jun 2018

Something appears to be wrong in TCP retrying. Unfortunately I could really do with the other end of the link.

To document my analysis(*), things go wrong around packet 17724. I'm filtering the dump in Wireshark for tcp.port == 445 so I'm only looking at Samba traffic - I'll check anything else later.

17733 is the ACK to seq num 924391058, which was the last segment of buffer 17718. All fine.
17736 is the ACK to seq num 924430478, which was the last segment of buffer 17724.
We get a duplicate of that ACK microseconds later in 17737, probably triggered by the other end receiving an out of order packet later in the stream (so what happened to the next packet? Do we need intermittent network errors/dropped packets to reproduce this?).
Normally you'd expect TCP to retry buffer 17725 (the next block of data) after a timeout period, but that never happens.
30 seconds later you get 18016 with 192.168.1.132 sending a keep-alive across to ensure the session doesn't time out. That is immediately acknowledged at 18017, so the TCP connection is still alive.
Same again at 18351 and 18352 a further 30 seconds later.
18384 is 192.168.1.132 giving up and closing the connection.

So that sort of points the blame at the TCP retry mechanism, which would be common to many situations. Why does segmentation offload appear to trash that? At the moment I have no idea.

(*) My networking knowledge is a tad rusty, so there be more recent updates to TCP that I'm not aware of. W Richard Stevens' book "TCP/IP Illustrated Volume 1: The Protocols" is my friend, copyright 1994. It includes lots of lovely details about SLIP links and the like!

6by9 on 13 Jun 2018

@6by9 I'm actually using port 420 for ssh, perhaps I should change it back to the default 22?

If it helps, my pi is ip 192.168.1.2 and my windows laptop is 192.168.1.132. I can try to get another log from the receiving end (windows) if it might help you guys track down the issue.

joickle on 13 Jun 2018

New bits of TCP - Selective ACK (aka SACK). RFC 2018 in 1996. There's a nice description at http://packetlife.net/blog/2010/jun/17/tcp-selective-acknowledgments-sack/
The receiver can add TCP options to denote that it is missing a section in the middle of a stream, so the transmitter only has to send the missing bit, not all sections after the duplicated ACK.

Buffer 17736 is ACKing up to 924430478, but has already also received 924468438-924490338, therefore there are 37960 bytes missing (buffer 17725 hasn't gone out at all?)
17737 updates that status to say it is still ACKing up to 924430478, but has also received 924468438-924494718 (all the other data).

So why does the Pi not respond to the SACK in the expected manner and retry 17725? I don't know.

Interesting comment from David Miller (one of the main Linux network developers) for RHEL. https://bugzilla.redhat.com/show_bug.cgi?id=485292#c43

It's a little odd that the entire unsegmented buffer goes missing. If it were an ethernet issue then you'd expect to lose ~1448 TCP bytes from a single full ethernet frame. So it _may_ (big leap) point to a USB issue where the transfer gets trashed, which might then follow from the need to be running the transfer also from USB. Or it may still be that the LAN adapter receives the request and discards it for reasons unknown.
Either way the TCP stack should retry the lost buffer and workaround the issue - it's meant to provide the guarantee of delivery over the unreliable transport of IP, so why isn't it.

6by9 on 13 Jun 2018

@joickle I'm glad it's innocuous :-)
Being SSH I couldn't see inside it, and the port number appeared to have less benign uses. I'd worked out the important IP addresses for what I needed.

If you could get a grab from Windows then that might be useful but not essential.
As I've just posted the dump had more information to give, so now I need to understand how Linux handles SACK a little better.
James has managed to get some failures now, but not that reliably. If it comes to it then I'll be hacking the kernel to deliberately drop 1 in N packets in the hope of artificially recreating the situation.

6by9 on 13 Jun 2018

Looks like you guys might be gaining some headway :)

Here's a grab from windows: https://www.dropbox.com/s/x3azsl6bc2ibh1u/windows_log.pcapng?dl=0

joickle on 13 Jun 2018

👍1

Thanks. That confirms it's SACK not recovering.
In that capture everything is OK up to packet 283335.

As Wireshark notes, for buffer 283336 a previous segment wasn't captured. 17520 bytes are missing based on the sequence numbers (Expected seq num was 188123878, but was 188141398).
Buffer 283347 ACKs to 188123878 (buffer 283335), with SACK set for 188141398-188157458 (buffers 283336-283346).
We then get duplicate ACKs for 188123878 (buffer 283335) with the SACK range extending further and further as more data is received.
Buffer 285250 again has dropped some more data (33580 bytes), so we end up with SACK ranges of 188141398-190788378 and 190821958-190848238, with the second one then extending.
At buffer 286510 we're on duplicate ACK number 392, and at that point the Pi sends no further data on that session. SACK is still saying we have 188141398-190788378 and 190821958-1992235238.
There are regular keep-alives sent by .132 which the Pi ACKs.
The missing data appears to never be retransmitted.

6by9 on 13 Jun 2018

Internal discussions have concluded we'll disable all TCP segmentation offload for now. The performance overhead isn't measurable using standard tools, so it's not that big a loss.
Investigations will continue to try and understand why SACK is going wrong for us, but it'll workaround the issue for now.
A big thank you to particularly @joickle for running so many tests for us.

6by9 on 13 Jun 2018

@6by9 No problem at all :-)

As I'm still running the latest stable 4.14.34, is there a way to make sudo ethtool -K eth0 tx-tcp-segmentation off persist across reboots until a permanent fix is released? Or would it would be better to run an rpi-update?

joickle on 13 Jun 2018

I've not tried as yet, but https://forum.ivorde.com/linux-tso-tcp-segmentation-offload-what-it-means-and-how-to-enable-disable-it-t19721.html implies you can do it via /etc/network/interfaces. Stretch has moved most stuff from there to dhcpd though, so I'm not sure if that works.
/etc/network/if-up.d is more likely - I'll have a play whilst waiting for the kernel to build and report back.

6by9 on 13 Jun 2018

Checking with our Raspbian packaging expert, the setup with /etc/network/interfaces is that if you can put all the normal boilerplate in there and it'll override dhcpd, but then you're on a non-standard setup. He's not aware of a way via dhcpd.
Adding scripts to /etc/network/if-up.d only works if the interface is defined in /etc/network/interfaces.

Temporarily if you want to do it, then I seem to get the desired result with

# interfaces(5) file used by ifup(8) and ifdown(8)

# Please note that this file is written to be used with dhcpcd
# For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp
        pre-up ethtool -K eth0 tx-tcp-segmentation off tx-tcp6-segmentation off
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

in /etc/network/interfaces, but I provide no guarantees on that!

6by9 on 13 Jun 2018

@joickle
That's what I used

sudo nano /etc/systemd/system/[email protected]

Paste this:

[Unit]
Description=Disable offload on NICs
DefaultDependencies=no

Before=network-pre.target
Wants=network-pre.target

Wants=systemd-modules-load.service local-fs.target
After=systemd-modules-load.service local-fs.target

Conflicts=shutdown.target
Before=shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/ethtool -K %i tso off

[Install]
WantedBy=multi-user.target

Save the file.

Run

sudo systemctl daemon-reload
sudo systemctl enable ethtool-offload@eth0
sudo systemctl start ethtool-offload@eth0

Edit: fixed ethtool path, my OS doesn't have split /usr, sorry.

asavah on 13 Jun 2018

Thanks guys, I'll try these :)

joickle on 13 Jun 2018

I've the suspicion that is not just a workaround, it's the solution. It wouldn't be the first time that the lan78xx pretend feature which it doesn't support.

In 2013 there was a very similar patch for the smsc75x, so maybe we need to drop NETIF_F_SG, too.

lategoodbye on 13 Jun 2018

@lategoodbye Hmm, if skb_linearize is doing what it appears (ie converting from a SG to linear), then I think I agree with you that SG is not supported, although in the normal manner of an engineer I'd like to confirm my understanding is correct.
I think I also recall having read that TSO _requires_ SG (can't see the reference quickly now), and therefore if no SG then TSO is invalid. It seems a shame seeing as it does appear to work (ignoring the stalls), but there is no gain if it's copying somewhere else.
Interesting to note that r8152.c does advertise SG | TSO | TSO6 and doesn't call skb_linearize, so I'm not clear if it is a fundamental limitation of USB LAN adapters or just a quirk of some devices.

Daft thought, I wonder if the lost packets are due to the skb_linearize failing and therefore sending gibberish to the adapter.
...
OK, it does handle the return value from skb_linearize correctly, and should increment tx_errors and tx_dropped, but doesn't tell the layers above that it failed.

6by9 on 13 Jun 2018

Just a quick note for anyone trying to get the workaround to persist across reboots.

Adding ethtool -K eth0 tx-tcp-segmentation off tx-tcp6-segmentation off in /etc/rc.local before the exit line also seems to do the trick.

The /etc/network/interfaces method above didn't work for me, it reported dhcpcd failed to start on boot :(

joickle on 13 Jun 2018

@6by9 Today i was able to reproduce the stall once (sending 4 GB file stored on external HDD (NTFS) from RPi 3 B+ to Linux notebook). I'm not sure it is related but i disabled GSO and leave TSO enabled before via ethtool. After the stall the statistics from /sys/class/net/eth0/statistics showed a 2 for tx_error and tx_dropped.

lategoodbye on 13 Jun 2018

I have those problems mostly when sending files TO to the Pi, not FROM the Pi.

NorbertM21 on 14 Jun 2018

Latest rpi-update kernel includes @6by9's workaround

popcornmix on 14 Jun 2018

@lategoodbye I think you may have prodded me in the right direction there on skb_linearize.
Adding a simple module parameter for a drop level, and artificially failing in lan78xx_linearize when that count hits I see a guaranteed stall every time. I'm actually using scp to do the copy rather than nfs or samba, but this appears to be anything over TCP.
Wireshark dumps show the same symptom of duplicate ACKs repeating the same segment as having been missed, but it never gets resent.

Disable tx-tcp-segmentation and I still get 178 duplicate ACKs with SACK info, but then the Pi retransmits the missing packets and things resume as normal.
(Slightly curious that I appear to get 24616 (17*1448) bytes missed in the stream even though I only fail one skb_linearize call. I was expecting not to get passed unsegmented packets with that disabled. generic-segmentation-offload is still enabled, so perhaps it is just redirecting part of TCP).

This is getting to the stage that we can post a sensible report on net-dev :-)

6by9 on 14 Jun 2018

👍1

Found a reference that TSO requires SG - https://wiki.linuxfoundation.org/networking/tso

Comparing to the r8152 driver I think there's nothing stopping lan78xx doing SG.
Both require a linear buffer, so are copying data from an skb to an urb. Whilst lan78xx uses skb_linearize and passes the resulting buffer to usb_fill_bulk_urb, r8152 appears to copy all fragments into a struct tx_agg buffer that it has allocated and then sends that.

6by9 on 14 Jun 2018

Is WiFi affected? Is turning off "tx-tcp-segmentation" required for this interface as well?

fieryo on 14 Jun 2018

No, and no.

pelwell on 14 Jun 2018

sudo ethtool -k wlan0 would have told you that no segmentation offload mechanisms are supported on the wifi adapter, so no way to be affected.

I haven't eliminated the possibility that there is a generic bug in TCP with tso active in the event of packet loss, however there does appear to be an issue in the LAN78xx driver that can result in packet loss in the driver so it may be easier to trigger.

6by9 on 14 Jun 2018

Having the exact same problem, here, with CentOS 7.5 (freshly YUM-updated) with kernel 4.14.43.

Context:
SSH stalled and time out, on huge file transfer from RPI 3B+ to any Linux-box (bare metal or virtual machines).
This happens after a few 10Mb, generally less than 200Mb.
Huge file is basically a modified .img, containing my own customized image (> 3Gb).
Tried a few different SD, from different size and brand with same effect.
Reproduced 100%
Same SD inserted on a 3B works perfectly with same file transfer procedure to same servers.

Read some other posts/threads/sites, here or there. Nothing worked-out, so far.
Forgot to mention that I do no observe any kernel OOPS.

Read this thread and tried this:
dd if=/dev/zero bs=1M status=progress | ssh user@target "cat >/dev/null"
Transfer is fine for at least 1.5 Gb (had to cut the command), which is far better than what I had previously, but I need to copy a real file... /dev/zero is not so interesting :-)

Then, I tried this:
sudo ethtool -K eth0 tx-tcp-segmentation off tx-tcp6-segmentation off
So far, this looks to do the trick for a few attempts.

Hope this helps.

stephan57160 on 27 Jun 2018

@stephan57160 The disabling of TSO by default went in with 4.14.49, so running on an unpatched 4.14.43 I would expect to see issues.
We don't maintain CentOS, therefore there's not much we can do to get the fix deployed.

At least the confirmation that sudo ethtool -K eth0 tx-tcp-segmentation off tx-tcp6-segmentation off solves the issue confirms it is the same problem and we don't have to go off hunting for something else.

6by9 on 27 Jun 2018

No issue for CentOS. I was expecting this answer, yes :-)

At least, now I can point out this to the maintainers and wait for an update on their side.
Thx anyway for the job.

stephan57160 on 27 Jun 2018

A systemd-networkd workaround until kernel fix is available. I for one use upstream kernels with Arch Linux ARM. Paste the following in /etc/systemd/network/00-lan78xx.link, then reboot:

[Match]
Driver=lan78xx

[Link]
TCPSegmentationOffload=no
TCP6SegmentationOffload=no

lassebm on 12 Jul 2018

The kernel fix is already in our current Raspbian release. apt get update/upgrade should get it.
We also have a recent fix for another lan78xx isue, which fixes large transfers causing kernel oops, often involving attached USB devices. This is in rpi-update.

JamesH65 on 12 Jul 2018

@JamesH65 Do you know if these fixes are being upstreamed? Would be nice to have them in 4.18.

lassebm on 12 Jul 2018

@lassebm
Yes, this one: https://github.com/raspberrypi/linux/commit/58ab5c274ae0347a949e98f51e646df58f572535
No, that one: https://github.com/raspberrypi/linux/commit/db81c14ce9fbd705c2d3936edecbc6036ace6c05

But you can cherry-pick latter one, which I did it for 4.16

mkreisl on 12 Jul 2018

👍1

I believe https://github.com/raspberrypi/linux/commit/db81c14ce9fbd705c2d3936edecbc6036ace6c05 is being upstreamed (or at least sent there!) next week. I cannot see it causing too much of a ruckus.

JamesH65 on 12 Jul 2018

👍1

Any chance those lan78xx fixes will be backported to 4.9.y?

smp79 on 24 Jul 2018

I'm not sure you are talking about the Foundation tree or Mainline?
The fix for the race condition has been ported to Linux stable 4.9.114, 4.14.57 and 4.17.9

lategoodbye on 24 Jul 2018

If you are talking about Raspbian/our kernel, then no, I doubt we will be backporting it. We recommend moving to 4.14.

JamesH65 on 24 Jul 2018

Yes, I was talking about rpi-4.9.y. 4.14 and newer is currently not an option for me because of this unsolved issue.

smp79 on 24 Jul 2018

It would be fairly easy to build your own kernel with this patch in whilst we try and get to the bottom of the other issue.

JamesH65 on 24 Jul 2018

It would be fairly easy to build your own kernel with this patch in whilst we try and get to the bottom of the other issue.

I'm not using LE, but I don't believe that it is fairly easy to build own kernel for LE

mkreisl on 24 Jul 2018

Not sure what you mean by LE, but here are the instructions for building your own kernel. https://www.raspberrypi.org/documentation/linux/kernel/building.md

Clone the correct branch (rpi-4.9.x), apply this patch (should be a clean apply), rebuild.

JamesH65 on 24 Jul 2018

@JamesH65
OMG, if you would read the issue @smp79 referred, you would know what LE means: LE = abbreviation of LibreELEC

And yes, I do know how to build own kernel. I'm doing this every day for XBian

mkreisl on 24 Jul 2018

So why not type LibreElec and save everyone a load of sodding time? It's only 7 extra characters.

No idea of the complexities of a LibreELEC (see what I did there?) build, but still unlikely we would be backporting so I'd suggest approaching LibreELEC to fix their own build.

JamesH65 on 24 Jul 2018

Pi3B+ is at the moment absolutely unusable, crappiest Pi ever had. Extremely frustrating

@mkreisl
Was your issue resolved? I'm on the fence about upgrading my Pi3B to Pi3B+. I wonder if those LAN issues are fully fixed by now.

smp79 on 26 Jul 2018

@smp79
I haven't running stress tests on the Pi3B+ for a long time, but in standard environment this device seems to be in a usable state.

mkreisl on 26 Jul 2018

@smp79 As far as we are concerned the queue walk fix from a couple of weeks ago was the big one, and we expect it to fix the huge majority of issues people were seeing with the lan78xx. Combine that with a few other fixes that have gone in, both for the lan78xx and the clock speed speed changes, and we expect the Pi3B+ to be at least as robust as the 3B. Of course, there may still be unreported/unexpected issues, but they should now be few and far between. There are still some outstanding issues in the wireless system, but they are edge cases, and are being worked on by Cypress.

JamesH65 on 26 Jul 2018

Hi, read the majority of comments here and saw all these commits but I'm still confused about it. So a fix is being worked on, right? Can I help by providing some information?
This bug makes my Pi3 B+ almost useless so I'd like to ask if there is any ETA.

headlesscyborg on 24 Oct 2018

👍1

@headlesscyborg As far as we are concerned the issue has been worked around and people should not be observing networking issues. (You have reminded me that I haven't asked the questions on net-dev).

You've given no details whatsoever about your setup and what issues you are seeing, therefore we can't help you.
If uname -a doesn't report that you are running 4.14.49 or later then update your system. Beyond that you need to provide as many details as possible to give us even a vague chance of reproducing your problem.

6by9 on 24 Oct 2018

Are you using the latest Raspbian? We're not doing anything on this at the moment, because we believe it to be working well. If you are still having problems with the latest Raspbian/kernel, then we will need to take another look.

JamesH65 on 24 Oct 2018

Model: Raspberry Pi 3 B+
Kernel: 4.14.71-v7+ #1145 SMP Fri Sep 21 15:38:35 BST 2018 armv7l GNU/Linux
Distribution:

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"

System updates: fully updated

I have my Pi connected to network using RJ45 wired network, running fully updated Raspbian 9 from an external SSD connected via USB. Everything works as excepted except transfering files over the network using FTP or SFTP - exactly like described in the original post, it transfers random amount of data (~30 MB for example) with ~6-10MB/s speed then it stops and after 10-30 seconds the transfer continues with only ~50-150KB/s speed and frequent drops.

This only happens in the following situations:

the file is transfered from the Pi to another device from an external USB device (external SSD, external USB flash drive etc.) connected to the Pi

It does not happen in the following situations:

the file is transfered from the Pi to another device from the internal SD card reader included in the Pi
the file is transfered from another device to an external USB device (external SSD etc., I haven't tried transfering to a USB flash drive yet) connected to to Pi

Additional info:

my Wi-Fi router / AP (in which is the Pi connected using a RJ45 wire) is Asus RT N11P
there is no difference in behaviour between wired and wireless connection

My other devices that I use to transfer files from/to the Pi are:

a laptop with Arch Linux, no difference between Wi-Fi and wired RJ45 connection, the laptop has an SSD M2 NVME so there should not be any issues with hard drive speed (I tried Nautilus built-in FTP/SFTP support as well as Filezilla and Firefox FTP)
an Android smartphone - HTC U11 connected using Wi-Fi (ES File Explorer SFTP/FTP client)
exactly the same behaviour on both devices across all SFTP/FTP clients I tried

EDIT: I made some changes in the way I described when it happens and when it doesn't to make it sound less confusing.

headlesscyborg on 24 Oct 2018

Please provide the output from sudo ethtool -k eth0, sudo ethtool -a eth0, and sudo ethtool -S eth0.

Ideally please provide a capture using tcpdump or wireshark when you attempt one of these transfers.

Anything odd logged in dmesg?
With 4.14.71 you may see hw csum failure messages - that was a bug that was introduced from the mainline kernel and has just been resolved but not propagated into the Raspbian repos quite yet.

6by9 on 24 Oct 2018

Outputs:
https://gist.github.com/headlesscyborg/a2fa690f70079403a4e5edb4309f1704
https://gist.github.com/headlesscyborg/2e64e375d6938afabf2547c3e997718e
https://gist.github.com/headlesscyborg/007159ab6b8fb02285807d668484e757
https://gist.github.com/headlesscyborg/1689f544ce6314f5c8535c78e98c07d9

Unfortunately I can't provide Wireshark outputs because it completely freezed the Pi (or VNC/SSH sessions - I have no monitor attached to it) after a minute.

Howewer I was testing the SFTP file trasfer today and for some reason it works now. Strange because I was experiencing the issue in the last 2 months whenever I tried it and now when I finally reported it here it just disappeared. I will be testing it in the next days.

headlesscyborg on 26 Oct 2018

@pelwell @JamesH65 Are we happy to close this one now? TSO was the main cause, so any new reports really need to go through a full triage rather than getting tagged on to this.

6by9 on 26 Oct 2018

I'm happy to close the issue, but we should support @headlesscyborg here since we've started already.

pelwell on 26 Oct 2018

So leave it open for a few days for @headlesscyborg to come back, and then close if resolved?

6by9 on 26 Oct 2018

I think that's reasonable.

pelwell on 26 Oct 2018

@pelwell 18 days and no response from @headlesscyborg. Happy to close?

6by9 on 13 Nov 2018

Yes. @headlesscyborg - please open a new issue if the problem recurs, copying any relevant information from this issue.

pelwell on 13 Nov 2018

Linux: Pi3B+ : USB+ethernet file transfer problem

Most helpful comment

All 204 comments

ethtool --offload eth0 rx off tx off

Related issues