Aws-iot-device-sdk-python-v2: AWS_IO_TLS_ERROR_NEGOTIATION_FAILURE on 1.0.6 (awscrt 0.5.13)

Created on 15 Apr 2020  ·  17Comments  ·  Source: aws/aws-iot-device-sdk-python-v2

Describe the bug
After upgrading my installation to awsiotsdk-1.0.6 it stopped working with awscrt failing to authorize with aws' backend. Might be connected with awscrt upgrade to 0.5.13.

SDK version number
awsiotsdk-1.0.6

Platform/OS/Device
rpi3b+

To Reproduce (observed behavior)
Upgrade awsiotsdk to version 1.0.6.

Expected behavior
Should work as expected.

Logs/output

Apr 15 06:55:01 raspberrypi python3[581]: Connecting to something-east-1.amazonaws.com with client ID 'foo-bar-baz'...
Apr 15 06:55:01 raspberrypi python3[581]: Traceback (most recent call last):
Apr 15 06:55:01 raspberrypi python3[581]:   File "app.py", line 86, in <module>
Apr 15 06:55:01 raspberrypi python3[581]:     connect_future.result()
Apr 15 06:55:01 raspberrypi python3[581]:   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
Apr 15 06:55:01 raspberrypi python3[581]:     return self.__get_result()
Apr 15 06:55:01 raspberrypi python3[581]:   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
Apr 15 06:55:01 raspberrypi python3[581]:     raise self._exception
Apr 15 06:55:01 raspberrypi python3[581]: awscrt.exceptions.AwsCrtError: AwsCrtError(name='AWS_IO_TLS_ERROR_NEGOTIATION_FAILURE', message='TLS (SSL) negotiation failed', code=1029)
Apr 15 06:55:01 raspberrypi systemd[1]: app.service: Main process exited, code=exited, status=1/FAILURE
guidance

Most helpful comment

@JonathanHenson I was using the old endpoint (<endpoint-id>.iot.us-east-1.amazonaws.com), I switched to the ATS one (<endpoint-id>-ats.iot.us-east-1.amazonaws.com) and now it works. Thanks for your help!

All 17 comments

I experienced the same issue.

Ran across your post while searching for a similar issue with CPPv2 SDK that I believe is based on same underlying functions. I figured out that the CPPv2 SDK was trying to validate the full-chain of trust, and not just my supplied CA cert -- but the full chain-of-trust didn't exist on my device. By copying /etc/ssl/certs/ca-certificates.crt from my Ubuntu machine onto my device and passing it to the Aws::Iot::MqttClientConnectionConfigBuilder::WithCertificateAuthority() method I was able to provide a complete trust store that worked. Not sure what the Python equivalent for this method is, but it must be there -- hoping this hint helps you!

Update: on the last version of the C++ SDK, I also see that S2N (the AWS TLS stack) will pick ECDSA for signing, which is not supported on the release version of the code-base.

@jsakwa's advice seems sound.
Are you able to run samples/pubsub.py with --root-ca set? Usually you'd pass the Amazon Root CA 1 file available here

@graebm, got no bandwidth to test atm. Does it work for you? If it does work, that’d feel unnatural because the docs clearly say that the switch should be used if the root of trust is not installed on your machine already and it is on mine (it works after downgrading).

@graebm I just tested again. Version 1.0.5 runs fine. The problem occurs when using version 1.0.6. I'm using a certificate signed with that CA you mention (not the deprecated one).

This test was run using pubsub.py with --root-ca in a Docker container.

This is an excerpt of the debug output:

(...)

[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfc006320: Scheduling attempt_connection task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfc006320: Running attempt_connection task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: initializing with domain 1 and type 0
Unsupported setsockopt level=1 optname=16384
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: setting socket options to: keep-alive 0, keep idle 0, keep-alive interval 0, keep-alive probe count 0.
[WARN ] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: beginning connect.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: connecting to endpoint [redacted]:443.
[ERROR] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: connect failed with error code 99.
[INFO ] [2020-04-25T22:14:14Z] [fdeff460] [dns] - id=0xff2df650: recording failure for record [redacted] for [redacted].iot.us-east-1.amazonaws.com, moving to bad list
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [dns] - static: purging address [redacted] for host [redacted].iot.us-east-1.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: is still open, closing...
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: closing
[ERROR] [2020-04-25T22:14:14Z] [fdeff460] [channel-bootstrap] - id=0xff3109a0: failed to create socket with error 1055
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfc006450: Scheduling attempt_connection task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfc006450: Running attempt_connection task with <Running> status
Unsupported setsockopt level=1 optname=16384
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: initializing with domain 0 and type 0
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: setting socket options to: keep-alive 0, keep idle 0, keep-alive interval 0, keep-alive probe count 0.
[WARN ] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: beginning connect.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: connecting to endpoint [redacted]:443.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503440: Scheduling (null) task for future execution at time 3166071860135
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503488: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[INFO ] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: connection success
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: local endpoint 172.17.0.2:58438
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: assigning to event loop 0xff281f60
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel-bootstrap] - id=0xff3109a0: client connection on socket 0xfd502a50 completed with error 0.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: Beginning creation and setup of new channel.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503c48: Scheduling on_channel_setup_complete task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503488: Running epoll_event_loop_unsubscribe_cleanup task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503c48: Running on_channel_setup_complete task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: setup complete, notifying caller.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: no message pool is currently stored in the event-loop local storage, adding 0xfd504740 with max message size 16384, message count 4, with 4 small blocks of 128 bytes.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel-bootstrap] - id=0xff3109a0: channel 0xfd504088 setup succeeded: bootstrapping.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket-handler] - id=0xfd514e68: Socket handler created with max_read_size of 16384
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd515038: Scheduling tls_timeout task for future execution at time 3165183175311
[WARN ] [2020-04-25T22:14:14Z] [fdeff460] [tls-handler] - id=0xfd515020: negotiation failed with error Invalid signature algorithm (Error encountered in /tmp/pip-wheel-tiv_gdev/awscrt/aws-common-runtime/s2n/tls/s2n_client_cert_verify.c line 96)
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel-bootstrap] - id=0xff3109a0: tls negotiation result 1029 on channel 0xfd504088
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5041c8: Scheduling channel_shutdown task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: Channel shutdown is already pending, not scheduling another.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5041c8: Running channel_shutdown task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: beginning shutdown process
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: handler 0xfd514e68 shutdown in read dir completed.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [tls-handler] - id=0xfd515020: Shutting down read direction with error code 1029
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: handler 0xfd515020 shutdown in read dir completed.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5040a0: Scheduling (null) task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5040a0: Running (null) task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [tls-handler] - id=0xfd515020: Shutting down write direction
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: handler 0xfd515020 shutdown in write dir completed.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [socket] - id=0xfd502a50 fd=6: closing
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503c00: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd514ec8: Scheduling socket_handler_close task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd503c00: Running epoll_event_loop_unsubscribe_cleanup task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd514ec8: Running socket_handler_close task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: handler 0xfd514e68 shutdown in write dir completed.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5040a0: Scheduling (null) task for immediate execution
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd5040a0: Running (null) task with <Running> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: during shutdown, canceling task 0xfd515038
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [task-scheduler] - id=0xfd515038: Running tls_timeout task with <Canceled> status
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel-bootstrap] - id=0xff3109a0: channel 0xfd504088 shutdown with error 1029.
[DEBUG] [2020-04-25T22:14:14Z] [fdeff460] [channel] - id=0xfd504088: destroying channel.
Traceback (most recent call last):
  File "/opt/controller/pubsub.py", line 141, in <module>
    connect_future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AwsCrtError(name='AWS_IO_TLS_ERROR_NEGOTIATION_FAILURE', message='TLS (SSL) negotiation failed', code=1029)
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [mqtt-client] - id=0xff3408e0: user called disconnect.
[ERROR] [2020-04-25T22:14:14Z] [ff37c010] [mqtt-client] - id=0xff3408e0: Connection is not open, and may not be closed
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [mqtt-client] - id=0xff3408e0: Destroying connection
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [mqtt-topic-tree] - tree=0xff340960: Cleaning up topic tree
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [mqtt-client] - client=0xfe8bff28: Cleaning up MQTT client
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [channel-bootstrap] - id=0xff3109a0: releasing bootstrap reference
[DEBUG] [2020-04-25T22:14:14Z] [ff37c010] [channel-bootstrap] - id=0xff3109a0: destroying

Hmm.... we did update our cipher preferences recently. Any chance you could get a wire shark capture of the TLS handshake and post the client and server hello messages?

Hey @JonathanHenson @graebm if this is using the same S2N stack as the CPPv2 SDK, I observed that in the latest SDK version v1.5.5 on my system, this also fails because the device advertises support for ECDSA, but the S2N stack only supports RSA, and during negotiation ECDSA is chosen. adolfogc's log looks the same as mine during that particular failure mode.

I tried to pull in the version of S2N on master, which is supposed to support ECDSA, but at least on my system this also failed, because it would somehow try to use the ECDSA key from the RSA code. I was planning to fix that and submit a pull-request to S2N, but haven't gotten around to it.

I'm currently using v1.5.1 of the CPPv2 SDK which negotiates to always use RSA during the challenge step

Okay, I know where to look now, but any chance you could get me a handshake capture so I can go look into this s2n‘s side?

Hi @JonathanHenson, would the following suffice?

Client hello:
image

Server hello:
image

Handshake:

image

Was the alert sent from client or server?
Also looks like SNI is empty? That would also cause the negotiation to fail. What are you setting for server name?

@JonathanHenson I was using the old endpoint (<endpoint-id>.iot.us-east-1.amazonaws.com), I switched to the ATS one (<endpoint-id>-ats.iot.us-east-1.amazonaws.com) and now it works. Thanks for your help!

According to https://aws.amazon.com/blogs/iot/aws-iot-core-ats-endpoints/ we should switch to -ats asap, @adolfogc, @JonathanHenson?

@mkozjak I did and the problem was resolved

Thank you Mario for sharing the link to that post here. Again: https://aws.amazon.com/blogs/iot/aws-iot-core-ats-endpoints/

The default configuration for V2 device SDKs will no longer work with the old non-ATS endpoints. Run aws --region <region> iot describe-endpoint --endpoint-type "iot:Data-ATS" to get the ATS endpoint.

I was experiencing a persistent TLS negotiation failure, when using the aws iot python sdk v2, despite using the .ats endpoint, but as suggested in the aws blog, was able to solve the problem by switching the root certificate to an AmazonRootCA1 certificate from a G2-RootCA1 certificate.

Hi there,
I was stuck with this error for 2 hours when migrate from old sdk to v2(v1.5.0) even I tried on multiple environment both Windows10, Ubuntu18 or MacOS.

old sdk example basicPubSub.py just works with my key, certificate & rootCA1
But new sdk example pubsub.py just does not work with same keys & policy

[ERROR] [2020-09-25T11:11:31Z] [00004ed8] [socket] - id=000001A7825DF580 handle=0000000000000320: connect completion triggered with error -1073741305
[ERROR] [2020-09-25T11:11:31Z] [00004ed8] [socket] - id=000001A7825DF580 handle=0000000000000320: connection error with code 1055
[INFO] [2020-09-25T11:11:31Z] [00004ed8] [dns] - id=000001A7802A2780: recording failure for record xxx for xxx-ats.iot.ap-southeast-1.amazonaws.com, moving to bad list
[ERROR] [2020-09-25T11:11:31Z] [00004ed8] [tls-handler] - id=000001A782BE3580: Error during negotiation. SECURITY_STATUS is -2146892953
[ERROR] [2020-09-25T11:11:31Z] [00004ed8] [socket] - id=000001A7825E0C00 handle=000000000000032C: WriteFile() failed with error 64

I have to input the port so that it can work, hope it can also help you.

mqtt_connection_builder.mtls_from_path(
            ...
            port=8883)

Regards,
Justin

Sorry that you are having trouble Justin. Have you tried the example policy for pubsub given in the samples readme?

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iot:Publish", "iot:Receive" ], "Resource": [ "arn:aws:iot:region:account:topic/test/topic" ] }, { "Effect": "Allow", "Action": [ "iot:Subscribe" ], "Resource": [ "arn:aws:iot:region:account:topicfilter/test/topic" ] }, { "Effect": "Allow", "Action": [ "iot:Connect" ], "Resource": [ "arn:aws:iot:region:account:client/test-*" ] } ] }

If that doesn't work please feel free to open a new issue if you are still having difficulties. We can't guarantee that messages on closed issues will be seen.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shravan097 picture shravan097  ·  6Comments

supertick picture supertick  ·  7Comments

banuprathap picture banuprathap  ·  10Comments

qcabrol picture qcabrol  ·  8Comments

mkozjak picture mkozjak  ·  8Comments