Terraform-aws-github-runner: Scale up lambda failed

Created on 17 Nov 2020  ·  17Comments  ·  Source: philips-labs/terraform-aws-github-runner

Hi. I've error on lambda scale up after setup your module.
Cloudwatch logs below:

ERROR   Invoke Error    
{
    "errorType": "Error",
    "errorMessage": "Failed handling SQS event",
    "stack": [
        "Error: Failed handling SQS event",
        "    at _homogeneousError (/var/runtime/CallbackContext.js:12:12)",
        "    at postError (/var/runtime/CallbackContext.js:29:54)",
        "    at callback (/var/runtime/CallbackContext.js:41:7)",
        "    at /var/runtime/CallbackContext.js:104:16",
        "    at /var/task/index.js:16834:16",
        "    at Generator.throw (<anonymous>)",
        "    at rejected (/var/task/index.js:16816:65)",
        "    at processTicksAndRejections (internal/process/task_queues.js:97:5)"
    ]
}

ERROR RequestError [HttpError]: Resource not accessible by integration at /var/task/index.js:15124:23 at processTicksAndRejections (internal/process/task_queues.js:97:5) { status: 403, headers: { 'access-control-allow-origin': '*', 'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', connection: 'close', 'content-encoding': 'gzip', 'content-security-policy': "default-src 'none'", 'content-type': 'application/json; charset=utf-8', date: 'Tue, 17 Nov 2020 17:51:47 GMT', 'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', server: 'GitHub.com', status: '403 Forbidden', 'strict-transport-security': 'max-age=31536000; includeSubdomains; preload', 'transfer-encoding': 'chunked', vary: 'Accept-Encoding, Accept, X-Requested-With', 'x-content-type-options': 'nosniff', 'x-frame-options': 'deny', 'x-github-media-type': 'github.v3; format=json', 'x-github-request-id': '93DE:E7C5:957F272:AC944E7:5FB40DB3', 'x-ratelimit-limit': '5600', 'x-ratelimit-remaining': '5598', 'x-ratelimit-reset': '1605639047', 'x-ratelimit-used': '2', 'x-xss-protection': '1; mode=block' }, request: { method: 'GET', url: 'https://api.github.com/repos/RaketaApp/packer-base-ami/actions/runs?status=queued', headers: { accept: 'application/vnd.github.v3+json', 'user-agent': 'octokit-rest.js/18.0.6 octokit-core.js/3.1.1 Node.js/12.18.4 (linux; x64)', authorization: 'token [REDACTED]' }, request: { hook: [Function: bound bound register] } }, documentation_url: 'https://docs.github.com/rest/reference/actions#list-workflow-runs-for-a-repository' }

documentation question

Most helpful comment

@npalm yes its issue with permission. I fixed the issue by providing Self-hosted runners access (Read & Write) in organization . In docs nothing mentioned about runner permission.

All 17 comments

I had the same error, try giving the app rights on the Actions group.

@adrianmiron Have You fixed it?

same issue +1

@npalm Could you please help me?

@adrianmiron i tried Actions group but still having issue. Can you share all your permissions? I am trying organization runner.

I do not recognize the issue The scale up lambda is fetching a messange from the queue, next it checks if there are still queued jobs. If yes it is scaling up. The scale up lambda is triggered for messages that are for 30 seconds on teh queue. The error message indicates the lambda is not allowed to call the API.

Please can you check if you GitHub app is setup according the docs. Since your scale up lambda is triggered it seems the app is installed for the repo, otherwise no event should be received. So most ligical looks like the permissions are not set right.

@manoj-k-deepr From my investigations of the same error, it turned out to be permission issues of the github app ( which is the one actually doing the query to the repo actions. I remember i went over the lambda -> github app thing 5 times and it was not it.

Share a printscreen with permissions on organisation/repo and i will compare in the morning.

@npalm yes its issue with permission. I fixed the issue by providing Self-hosted runners access (Read & Write) in organization . In docs nothing mentioned about runner permission.

There was problem with Github app permissions. @npalm Can You update documentation and specify what permission application requires

@mkryva Great you got it working. I will leave the issue open so we can update the docs. PR's for improving the docs are always welcome!

After updating permissions, it fails with following error:

ERROR AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances.

UPD: Looks like the reason was "You've reached your quota for maximum Spot Fleet Requests for this account."

scale up lambda failing for me, even after the latest commit of (ghes) fix by @mcaulifn


DEBUG   https://enterprise.github.custom.com/api/v3

ERROR   RequestError [HttpError]: request to https://enterprise.github.custom.com/api/v3/app/installations/22/access_tokens 
failed, reason: connect ETIMEDOUT 192.168.1.1:443
    at /var/task/index.js:2797:11
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async getInstallationAuthentication (/var/task/index.js:266:7) {
  status: 500,
  headers: {},
  request: {
    method: 'POST',
    url: 'https://enterprise.github.custom.com/api/v3/app/installations/22/access_tokens',
    headers: {
      accept: 'application/vnd.github.antiope-preview+json,application/vnd.github.machine-man-preview+json',
      'user-agent': 'octokit-request.js/5.4.12 Node.js/12.19.0 (linux; x64)',
      authorization: 'bearer [REDACTED]',
      'content-length': 0
    }
  }
}

ERROR RequestError [HttpError]: request to https://enterprise.github.custom.com/api/v3/app/installations/22/access_tokens failed, 
reason: connect ETIMEDOUT 192.168.1.1:443 at /var/task/index.js:2797:11 at processTicksAndRejections 
(internal/process/task_queues.js:97:5) at async getInstallationAuthentication (/var/task/index.js:266:7) 
{ status: 500, headers: {}, request: { 
    method: 'POST', url: 'https://enterprise.github.custom.com/api/v3/app/installations/22/access_tokens', 
    headers: { accept: 'application/vnd.github.antiope-preview+json,application/vnd.github.machine-man-preview+json', 
    'user-agent': 'octokit-request.js/5.4.12 Node.js/12.19.0 (linux; x64)', authorization: 'bearer [REDACTED]', 
    'content-length': 0 } } }



ERROR   Invoke Error    
{
    "errorType": "Error",
    "errorMessage": "Failed handling SQS event",
    "stack": [
        "Error: Failed handling SQS event",
        "    at _homogeneousError (/var/runtime/CallbackContext.js:12:12)",
        "    at postError (/var/runtime/CallbackContext.js:29:54)",
        "    at callback (/var/runtime/CallbackContext.js:41:7)",
        "    at /var/runtime/CallbackContext.js:104:16",
        "    at /var/task/index.js:50911:16",
        "    at Generator.throw (<anonymous>)",
        "    at rejected (/var/task/index.js:50893:65)",
        "    at processTicksAndRejections (internal/process/task_queues.js:97:5)"
    ]
}

@buamod you are using GHES? Right? Just to be sure, did you rebuild the lambda, and ensured it is used?

ETIMEDOUT would suggest GHES did not respond. Are you behind a proxy?

@buamod you are using GHES? Right? Just to be sure, did you rebuild the lambda, and ensured it is used?

I did deploy the latest commit lambdas, I did build them with docker commands from the Ci/build.sh script.

ETIMEDOUT would suggest GHES did not respond. Are you behind a proxy?

There might be a proxy I don't know. Let's say there is a proxy how would I pass that ?

@buamod proxy requirements can differ greatly. I would suggest contacting your network team for what they need to pass the connection.

Was this page helpful?
0 / 5 - 0 ratings