Stlink: st-util should exit on LIBUSB_ERROR_NO_DEVICE

Created on 17 Mar 2019  ·  12Comments  ·  Source: stlink-org/stlink

When I first unplug the USB debugger and then try to issue a command via GDB, st-util gives the following errors very rapidly:

  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
2019-03-17T14:26:42 ERROR src/flash_loader.c: flash loader run error           
2019-03-17T14:26:42 ERROR src/common.c: stlink_flash_loader_run(0x8000000) failed! == -1

thus occupying the CPU nearly 100%. This results in serious waste of time (restarting the virtual machine, etc.).

The kind of error message is quite normal. However, stutil should exit immediately on this error so that I could handle the rest of the issue by restarting stutil or taking some other measure.

bufixed componenst-util componenstlink-lib dependenclibusb errogdb-server staturesolved

All 12 comments

I think stlink should terminate at this point:

https://github.com/texane/stlink/blob/df3c2b02867db03fb82f6faaad71300398965e85/src/usb.c#L54

Yeah I think you are right, an call to libusb should be checked appropriate. If you want you can patch and send a PR. It should also be done for the other calls to libusb in the usb.c file. Thanks for reporting!

If you want you can patch and send a PR.

I would love to. However, I'm not sure that I understand the project structure so a plain exit(1) suffice here or not.

It should also be done for the other calls to libusb in the usb.c file.

I might not understand this very well. Do you mean "send_recv make the application terminate wherever it returns -1"?

I have placed a comment in the PR, there are more places where it can go wrong when calls are made to libusb.

The project is organised in the following way:

  • libstlink (usb.c is part of it and should never use exit in a library) files of the library are located in the src folder, most important is usb.c and common.c (and of course flash_loader.c)
  • st-util is located in src/gdbserver
  • st-info is located in src/tools/info.c
  • st-flash is located in src/tools/flash.c

I think this could be related to #888 and #445.
@ceremcem and @rewolff: What about the idea of discussing this together?

@chenguokai: Do you have an idea on how to proceed here?

Cannot reproduce this case.
I followed the procedure on my macOS machine. The error message appears, but no sign of entering any dead loops.
image
image

Edit: I issued command n after setting up a breakpoint on main function and unplugging st-link

There does exist one issue that the program ought to exit rather than unlimitedly retry sending or receiving packets.

https://github.com/texane/stlink/blob/0082e488579077762627dba2cee54ffca64dad17/src/gdbserver/gdb-server.c#L1121

This function call does not return a proper value to end the while(1) loop. Problems may lie inside the function or the while(1) mechanism.

After checked the error handling process with lldb, I probably figured out the fault code.

https://github.com/texane/stlink/blob/bf840a1ae82ffc3ef9929f243bb8050d6856f698/src/gdbserver/gdb-server.c#L1448

When _stlink_usb_step encounters an error initially triggered by the failure of a libusb call, it returns a non-zero return value (-1 in this case). However this return value is not handled by the recv-handle-send loop in gdb-server:

https://github.com/texane/stlink/blob/bf840a1ae82ffc3ef9929f243bb8050d6856f698/src/gdbserver/gdb-server.c#L1119

I come up with two possible deal strategies:

Common part: Check the return value of stlink_step in the switch case.

  1. If any error is presented, try to notify GDB and exit.
  2. Redo this function call for restricted times, if failure remains, notify GDB and exit, otherwise proceed.

Analysis:

Option 1 is acceptable since we cannot predict what will happen after a communication error. The draw back is that st-util may fail more frequently on some unstable usb ports or something alike.
Option 2 will give unstable devices another chance but may cause packet duplication, which makes debugging slightly more unpredictable.

Thx for the detailed analysis. I'd favour option 1 here. If people run into problems related to specific local hardware instabilities, there is nothing we can do to it anyway and this is also nothing what the stlink tools should considerate. It seems more important that we do any compromise on debugging. Are you able to fix this straight away?

I will check the documentation of gdb to send packet(s) properly. Not too difficult I guess.
After I am done, I will raise a PR.

I didn't say it, but I was going for option 2. But after your explanation: You're right. You've got me convinced: Unless there is a valid reason to believe an error is temporary an error should be considered fatal immediately.

Retrying without informing the user will lead to sudden surprises. Suppose 1/100th or 1/1000th of the transferred bits gets corrupted (at whatever level). If that results in a command being "invalid" in 99% of the time, the user in that situation would on average get plenty warnings (i.e. stlink suddenly quitting) that his hardware is unstable before he gets the silent data corruption that causes problems....

The "easy way out" is to just "exit ()". That will suddenly close the connection to gdb and it will handle that sufficiently gracefully.

Basic functions run well under the PR with a stm32f401 board.
The dead loop is resolved during my local test. Now st-util will exit. gdb will receive a failure reply and disconnect.

Was this page helpful?
0 / 5 - 0 ratings