When I first unplug the USB debugger and then try to issue a command via GDB, st-util gives the following errors very rapidly:
core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE
[!] send_recv STLINK_DEBUG_GETSTATUS
core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE
[!] send_recv STLINK_DEBUG_GETSTATUS
core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE
[!] send_recv STLINK_DEBUG_GETSTATUS
core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE
[!] send_recv STLINK_DEBUG_GETSTATUS
core status: unknown
2019-03-17T14:26:42 ERROR src/flash_loader.c: flash loader run error
2019-03-17T14:26:42 ERROR src/common.c: stlink_flash_loader_run(0x8000000) failed! == -1
thus occupying the CPU nearly 100%. This results in serious waste of time (restarting the virtual machine, etc.).
The kind of error message is quite normal. However, stutil
should exit immediately on this error so that I could handle the rest of the issue by restarting stutil
or taking some other measure.
I think stlink
should terminate at this point: https://github.com/texane/stlink/blob/df3c2b02867db03fb82f6faaad71300398965e85/src/usb.c#L54
I think
stlink
should terminate at this point:https://github.com/texane/stlink/blob/df3c2b02867db03fb82f6faaad71300398965e85/src/usb.c#L54
Yeah I think you are right, an call to libusb should be checked appropriate. If you want you can patch and send a PR. It should also be done for the other calls to libusb in the usb.c file. Thanks for reporting!
If you want you can patch and send a PR.
I would love to. However, I'm not sure that I understand the project structure so a plain exit(1)
suffice here or not.
It should also be done for the other calls to libusb in the usb.c file.
I might not understand this very well. Do you mean "send_recv
make the application terminate wherever it returns -1
"?
I have placed a comment in the PR, there are more places where it can go wrong when calls are made to libusb.
The project is organised in the following way:
I think this could be related to #888 and #445.
@ceremcem and @rewolff: What about the idea of discussing this together?
@chenguokai: Do you have an idea on how to proceed here?
Cannot reproduce this case.
I followed the procedure on my macOS machine. The error message appears, but no sign of entering any dead loops.
Edit: I issued command n
after setting up a breakpoint on main
function and unplugging st-link
There does exist one issue that the program ought to exit rather than unlimitedly retry sending or receiving packets.
This function call does not return a proper value to end the while(1) loop. Problems may lie inside the function or the while(1) mechanism.
After checked the error handling process with lldb
, I probably figured out the fault code.
When _stlink_usb_step
encounters an error initially triggered by the failure of a libusb call, it returns a non-zero return value (-1 in this case). However this return value is not handled by the recv-handle-send loop in gdb-server:
I come up with two possible deal strategies:
Common part: Check the return value of stlink_step
in the switch case.
Analysis:
Option 1 is acceptable since we cannot predict what will happen after a communication error. The draw back is that st-util may fail more frequently on some unstable usb ports or something alike.
Option 2 will give unstable devices another chance but may cause packet duplication, which makes debugging slightly more unpredictable.
Thx for the detailed analysis. I'd favour option 1 here. If people run into problems related to specific local hardware instabilities, there is nothing we can do to it anyway and this is also nothing what the stlink tools should considerate. It seems more important that we do any compromise on debugging. Are you able to fix this straight away?
I will check the documentation of gdb to send packet(s) properly. Not too difficult I guess.
After I am done, I will raise a PR.
I didn't say it, but I was going for option 2. But after your explanation: You're right. You've got me convinced: Unless there is a valid reason to believe an error is temporary an error should be considered fatal immediately.
Retrying without informing the user will lead to sudden surprises. Suppose 1/100th or 1/1000th of the transferred bits gets corrupted (at whatever level). If that results in a command being "invalid" in 99% of the time, the user in that situation would on average get plenty warnings (i.e. stlink suddenly quitting) that his hardware is unstable before he gets the silent data corruption that causes problems....
The "easy way out" is to just "exit ()". That will suddenly close the connection to gdb and it will handle that sufficiently gracefully.
Basic functions run well under the PR with a stm32f401 board.
The dead loop is resolved during my local test. Now st-util will exit. gdb will receive a failure reply and disconnect.