Terraform-aws-github-runner: dev-usw2-scale-up failure: "Failed handling SQS event" "PEM routines:get_name:no start line at Sign.sign"

Created on 28 Jul 2020  ·  7Comments  ·  Source: philips-labs/terraform-aws-github-runner

Summary

dev-usw2-scale-up Execution result: failed

Steps to reproduce

Trigger via commit in configured application with requisite github app per the docs/README in this repo and https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners

What is the current bug behavior?

ERROR Error: error:0909006C:PEM routines:get_name:no start line at Sign.sign ERROR Invoke Error
(see full error trace/out below)

What is the expected correct behavior?

The commit to configured repo causes lambda function execution and requisite scaling up of or deployment of AWS EC2 spot instance.

Relevant logs and/or screenshots

The most recent failure/error upon commit to configured github repo with github app configured to watch
CloudWatch: CloudWatch Logs: Log groups: /aws/lambda/dev-usw2-scale-up
available in github gist here:

gist-file-aws-lambda-dev-usw2-scale-up-error

Possible fixes

At first glance, appears like might be related to a cert/key error?

Who can address the issue

Requesting validation and suggestions on resolution

Other links/references

Thank you

Most helpful comment

Thanks @compiaffe . It appears additional permissions are required that are not specified in the README.
I now don't have any more errors in any lambda/cloudwatch logs, and see the action-runners.zip file in the S3 bucket, and am able to follow the README and see the expected functionality _except_ the spot instance creation--that appears to be my last issue.

ref: scaling-selfhosted-action-runners#putting-all-together

In case you build runners are not starting or registering. The best you can do is follow the trace. In the GitHub app on there is an advanced settings page where you can see the events sent to the webhook. When the event is not accepted double check the endpoint and secret. Next you can check the logs for the webhook and scale up lambda in cloud watch. Finally you can inspect the EC2 user data logging. Access via SSM (MP: should this be ssh?) is by default enabled. Just select connect to the instance and inspect the log /var/log/user_data.log.

The point of error is after the lambda and cloudwatch, and the next troubleshooting step is inspect the EC2 user data (logs) but, this is a bit tough as I don't have an instance when the spot instances are not deploying.

Thank you for your assistance.

All 7 comments

I saw this exact same issue if you did a base64 encode of the PEM key WITHOUT the
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----

It appears you need to take the complete content of the .pem file and base64 encode that.

Thanks @compiaffe ! that was definitely an issue.

It would definitely be helpful to have something like this in the README/docs:

  • GITHUB_KEY=$(base64 ./myapp-aws-github-runner.2020-07-29.private-key.pem) ; cat $GITHUB_KEY > encoded_key.out now use this entire file as the github_app_key_base64 value

I have redeployed and now seeing the following SQS error in the lambda scale-up function, 'curious if you've seen this one before?

2020-07-29T12:38:43.476-07:00 | 2020-07-29T19:38:43.476Z    e67a6201-6e39-50c8-94f7-359abc4954cd    ERROR   Invoke Error    {     "errorType": "Error",     "errorMessage": "Failed handling SQS event",     "stack": [         "Error: Failed handling SQS event",         "    at _homogeneousError (/var/runtime/CallbackContext.js:12:12)",         "    at postError (/var/runtime/CallbackContext.js:29:54)",         "    at callback (/var/runtime/CallbackContext.js:41:7)",         "    at /var/runtime/CallbackContext.js:104:16",         "    at /var/task/index.js:17333:16",         "    at Generator.throw (<anonymous>)",         "    at rejected (/var/task/index.js:17315:65)",         "    at processTicksAndRejections (internal/process/task_queues.js:97:5)"     ] }

Thank you.

@cmcconnell1 yes I've seen at one. Check the log entry before, it will likely show an API call to a Github endpoint answered with 403.

I had to give the Github App permissions to Actions, Checks and Self-hosted Runners, but don't seem to need to give it access to Administration.

Don't forget to go into the installation on the org or repo and allow the request for more/different permissions.

Thanks @compiaffe . It appears additional permissions are required that are not specified in the README.
I now don't have any more errors in any lambda/cloudwatch logs, and see the action-runners.zip file in the S3 bucket, and am able to follow the README and see the expected functionality _except_ the spot instance creation--that appears to be my last issue.

ref: scaling-selfhosted-action-runners#putting-all-together

In case you build runners are not starting or registering. The best you can do is follow the trace. In the GitHub app on there is an advanced settings page where you can see the events sent to the webhook. When the event is not accepted double check the endpoint and secret. Next you can check the logs for the webhook and scale up lambda in cloud watch. Finally you can inspect the EC2 user data logging. Access via SSM (MP: should this be ssh?) is by default enabled. Just select connect to the instance and inspect the log /var/log/user_data.log.

The point of error is after the lambda and cloudwatch, and the next troubleshooting step is inspect the EC2 user data (logs) but, this is a bit tough as I don't have an instance when the spot instances are not deploying.

Thank you for your assistance.

Still facing an issue?

@npalm we got past this I was missing the start/end of the cert. However, we are still blocked with the v0.2.0 tag with no deployment of the spot instances and no indications in the error logs to indicate failure/issue. AIR, the net effect was the same as in https://github.com/philips-labs/terraform-aws-github-runner/issues/104 although we do not see any permission errors. I have been on vacation and see now that there is a 0.3.0 and 0.4.0 tag version that I will start testing with. Thank you.

I'm having the same issue, I used the develop branch/0.0.5, and also did a base64 encode(including the BEGIN and END bits) and saved the value in variables.tf file as below, but still getting the error:. (Error: error:0909006C:PEM routines:get_name:no start line at Sign.sign (internal/crypto/sig.js:105:29)) and right after that ERROR Invoke Error {"errorType":"Error","errorMessage":"Failed handling SQS event"

image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rjcoupe picture rjcoupe  ·  15Comments

Kostiantyn-Vorobiov picture Kostiantyn-Vorobiov  ·  6Comments

mcaulifn picture mcaulifn  ·  13Comments

mkryva picture mkryva  ·  17Comments

npalm picture npalm  ·  11Comments