Knex: How do I use Knex with AWS Lambda?

Created on 20 Jan 2017 · 34Comments · Source: knex/knex

I'm running into issues with connection pooling while testing some code. I'm expecting my lambda function to be called maybe several thousand times over a couple of seconds, and having trouble figuring out the best way to connect to my db. Here's a very similar issue for node/postgres: essentially the problem is that I need to be able to get a connection from the pool if one is available, however I can't depend on the pool existing because of how AWS (unreliably) reuses lambda containers.

Basically what I'm looking for is a way to reliably get or create a connection to my db. I have not been able to find any examples (like while(!availableConnections) { tryToGetConnection() }. Do I need to interact with node-pool? How can I do this with Knex?

insightful question

Source

austingayler

👍5

Most helpful comment

Sorry I committed in a middle of sentence, my kid just threw like 3 liters of water to floor :1st_place_medal: I'll update the comment above in a moment...

elhigu on 22 Jan 2017

😄67 👀2

All 34 comments

Knex pooling works only if connections are made from the same node process.

If AWS lambda instances are not running shared node process you just have to create new pool with min / max connections 1 in each lambda instance and hope that your database has good enough settings to allow hundreds of simultaneous connections (in RDS it depends on instance size).

After reading this https://forums.aws.amazon.com/thread.jspa?threadID=216000

Looks like lambda really is sharing some processes, so if you can figure out which is maximum count of lambda containers, that are created you should be able to calculate optimal pool size (each container has their separate pool so total connections to DB is pool.max * max container count).

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

If you are getting error from DB which says that maximum connection count has been reached, then you need to make max pool size smaller.

elhigu on 22 Jan 2017

Sorry I committed in a middle of sentence, my kid just threw like 3 liters of water to floor :1st_place_medal: I'll update the comment above in a moment...

elhigu on 22 Jan 2017

😄67 👀2

Thanks for the reply. It sounds like estimating the maximum pool size will be my best bet, even though the number of connections will vary drastically over time.

To be clear, there is no way to close a connection in Knex, correct? Only the ability to destroy a connection pool? The fact that Lambda will sometimes reuse containers kind of throws everything off.

austingayler on 23 Jan 2017

Destroying connection pool destroys also all the connections (gracefully waiting that they complete first) and I suppose that when lambda destroys container, all its open TCP sockets will close implicitly when process dies.

I don't see why one should try to close connections explicitly after each request since it would destroy the benefit of pooling. You would get the same effect by creating pool with size 1 and destroying it afterwards.

You can also configure idle timeout for pool which will automatically close connection if it is not used and is just waiting for action in pool.

elhigu on 24 Jan 2017

Can I use Knex to send a COPY query to the RedShift cluster, and not wait for the results?

Doing this with pg Pool terminates the query as soon as the end of the Lambda function is reached.

BardiaAfshin on 17 Feb 2017

@BardiaAfshin If lambda container is destroyed and all its sockets are freed when lambda function end is reached, in that case also db query will die and might not be finished.

I'm not sure either how postgresql react to client-side connection ending if COPY query will be rolled back due to not finished implicit transaction before reading the result values...

Anyways if the query can be sent or not that way is not up to knex, but it depends on how aws lambda and postgresql work.

elhigu on 20 Feb 2017

My observation is that the query is killed on RedShift and it is rolled back.

BardiaAfshin on 21 Feb 2017

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

I'm running a db load testing script and this doesn't seem to be true. Anything higher than like 30 simultaneous connections immediately times out, rather than waiting for an open connection.

austingayler on 22 Feb 2017

Is there a way to manually release the currently used connection once a query is done?

Whoaa512 on 15 Mar 2017

👍9

Does anyone have example code they could share for those of us just getting into AWS Lambda? I am hoping that someone could share patterns and/or anti-patterns of knex/postgres/lambda.

Here is what I am using now - I am certain it can be improved, but hoping for some little bit of vindication as to whether or not I am on the correct path...

'use strict';
var pg = require('pg');

function initKnex(){
  return require('knex')({
      client: 'pg',
      connection: { ...details... }
  });
}

module.exports.hello = (event, context) =>
{
  var knex = initKnex();

  // Should I be returning knex here or in the final catch?
  knex
  .select('*')
  .from('my_table')
  .then(function (rows) {
    context.succeed('Succeeded: ' + JSON.stringify(rows || []));
  })
  .catch(function (error) {
    context.fail(error);
  })
  .then(function(){
    // is destroy overkill? - is there an option for knex.client.release, etc?
    knex.destroy();
  })
}

kurtzilla on 3 Jun 2017

👍4

I'm in the same boat--still have not figured out what the best way is despite lots of googling around and testing. What I'm doing right now is:

const dbConfig = require('./db');
const knex = require('knex')(dbConfig);

exports.handler = function (event, context, callback) {
...
connection = {..., pool: { min: 1, max: 1 },

This way the connection (max 1 per container) will stay alive so the container can be reused easily. I don't destroy my connection at the end.

http://blog.rowanudell.com/database-connections-in-lambda/

Not sure if this is the best way but it's worked for me so far.

/shrug

austingayler on 5 Jun 2017

👍6

@austingayler

I am not exactly sure what happens when const knex is declared - is this where the initial connection is setup? Can someone clarify? (I am assuming you have connection info in your dbConfig)

In your code, isn't the connection itself being overwritten and re-created everytime the handler is called?

kurtzilla on 13 Jun 2017

Just chiming in a someone who's in the same boat, I've not had much luck trying to figure out containers vs pool size (as per @elhigu's suggestion). My solution has been to destroy the pool after every connection (I know it's not optimal 😒):

const knex = require('knex');

const client = knex(dbConfig);

client(tableName).select('*')
  .then((result) => { 

    return Promise.all([
      result,
      client.destroy(),
    ])  
  })
  .then(([ result ]) => {

    return result;
  });

hassankhan on 20 Jun 2017

👍1

TL;DR: Simply set context.callbackWaitsForEmptyEventLoop = false before calling callback.

AWS Lambda waits for empty event loop (by default). so function could throw Timeout error even callback executed.

Please see below links for details:
https://github.com/apex/apex/commit/1fe6e91a46e76c2d5c77877be9ce0c206e9ef9fb

To @elhigu @tgriesser : This is not an knex issue. This is Definitely issue of Lambda environment. I think tag this issue to question and should be good to close :)

mooyoul on 8 Aug 2017

👍7 🎉2

@mooyoul yep, definately not knex issue, but maybe documentation issue... though I think I dont want any aws lambda specific stuff to knex docs, so closing.

elhigu on 15 Aug 2017

Please look at this link
https://stackoverflow.com/questions/49347210/why-aws-lambda-keeps-timing-out-when-using-knex-js
You need to close the db connection otherwise Lambda runs until it times out.

jlancelot2007 on 16 Jun 2018

Has anyone found a pattern that works flawlessly for using Knex with Lambda?

jamesdixon on 19 Jun 2018

Do you have any other issues when you close connection properly?

kibertoad on 19 Jun 2018

The thing I'm trying to avoid is closing the connection and making it available across Lambda calls.

jamesdixon on 19 Jun 2018

Are you sure Lambda allows maintaining state between calls?

kibertoad on 19 Jun 2018

Check out the node.js part of https://scalegrid.io/blog/how-to-use-mongodb-connection-pooling-on-aws-lambda/ it suggests a solution to hanging up on non-empty event loop.

kibertoad on 19 Jun 2018

I don't like bumping the thread like this, but since I think it's getting lost in the comments, @mooyoul's answer above worked great for us.

If you don't turn that flag to false, Lambda waits for the event loop to be empty. If you do, then the function finishes execution as soon as you call the callback.

mmarvick on 19 Mar 2019

For me, it worked on my local machine but not after deploying. I was kind of be mislead.

It turns out the RDS inbound source is not open to my Lambda function. Found solution at Stack Overflow: either changing RDS inbound source to 0.0.0.0/0 or use VPC.

After updating RDS inbound source, I can Lambda with Knex successfully.

The Lambda runtime I am using is Node.js 8.10 with packages:

knex: 0.17.0
pg: 7.11.0

The code below using async also just works

const Knex = require('knex');

const pg = Knex({ ... });

module.exports. submitForm = async (event) => {
  const {
    fields,
  } = event['body-json'] || {};

  return pg('surveys')
    .insert(fields)
    .then(() => {
      return {
        status: 200
      };
    })
    .catch(err => {
      return {
        status: 500
      };
    });
};

Hopefully it will help people who might meet same issue in future.

Hongbo-Miao on 29 May 2019

Something I'd like to draw people's attention to is the serverless-mysql package, which wraps the standard mysql driver but handles a lot of the lambda-specific pain points around connection pool management that are described in this thread.

However I don't think Knex will work with serverless-mysql at this time (according to this issue) since there's no way to swap in a different driver. There could also be incompatibilities since serverless-mysql uses promises instead of callbacks.

The best approach is probably to add a new client implementation in Knex. I'd be happy to give this a shot, but would love someone more familiar with Knex to tell me if they think it's reasonable/doable?

disbelief on 31 May 2019

👍5 👀1

Also, was thinking in the meantime, I could use Knex to _build_ but _not execute_ MySQL queries. So just call toSQL() and pass the output to serverless-mysql to execute.

What I'm wondering though is if Knex can be configured without any db connections? There's no sense opening a connection that's never used.

Would the following work?

connection = {..., pool: { min: 0, max: 0 ) },

disbelief on 31 May 2019

@disbelief You can initialize knex without connection. There is an example how to do it in the end of this secion in docs https://knexjs.org/#Installation-client

const knex = require('knex')({client: 'mysql'});

const generatedQuery = knex('table').where('id',1).toSQL().toNative();

elhigu on 31 May 2019

@elhigu ah cool, thanks. Will give that a shot in the interim.

disbelief on 31 May 2019

Update: the above doesn't appear to work. Knex throws an error if it can't establish a db connection, even if there's never any call to execute a query.

disbelief on 31 May 2019

@disbelief did you find a solution for that ?

fdecampredon on 8 Jul 2019

@fdecampredon nope. At the moment I'm simply building queries with squel, calling toString() on them, and passing them to the serverless-mysql client.

disbelief on 8 Jul 2019

@disbelief It's surprising to me that building the queries failed without a db connection.

Would you mind posting the code & config you're using to build the knex client? Also the knex version.

The following works fine for me with
Node v8.11.1
mysql: 2.13.0
knex: 0.18.3

const k = require('knex')

const client = k({ client: 'mysql' })

console.log('Knex version:', require('knex/package.json').version)
// => 0.18.3
console.log('sql:', client('table').where('id',1).toSQL().toNative())
// => { sql: 'select * from `table` where `id` = ?', bindings: [ 1 ] }

Whoaa512 on 9 Jul 2019

As per how Lambda's resources recycling works, Knex instance should always be kept outside any function or class. Also, it's important to have max 1 in connection in the pool.

Something like:

const Knex = require('knex');
let instance = null;

module.exports = class DatabaseManager {
  constructor({ host, user, password, database, port = 3306, client = 'mysql', pool = { min: 1, max: 1 }}) {
    this._client = client;
    this._poolOptions = pool;
    this._connectionOptions = {
      host: DB_HOST || host,
      port: DB_PORT || port,
      user: DB_USER || user,
      password: DB_PASSWORD || password,
      database: DB_NAME || database,
    };
  }

  init() {
    if (instance !== null) {
      return;
    }

    instance = Knex({
      client: this._client,
      pool: this._poolOptions,
      connection: this._connectionOptions,
      debug: process.env.DEBUG_DB == true,
      asyncStackTraces: process.env.DEBUG_DB == true,
    });
  }

  get instance() {
    return instance;
  }
}

In this way, you'll take the most out of Lambda's recycling, i.e. each activated (freezed) container will hold only the same connection.

As a side note, keep in mind that Lambda scales out by default if you don't limit the number of concurrent containers.

39otrebla on 4 Sep 2019

I've had Knex running in Lambda for almost a year now with no problems. I'm declaring my Knex instance outside of my Lambda function and utilizing context.callbackWaitsForEmptyEventLoop = false as mentioned in other posts.

That said, over the past day, something seems to have changed on the Lambda side as I'm now seeing a huge connection spike in Postgres; connections don't seem to be closed.

Has anyone else using the aforementioned approach seen any chances over the past day or so?

jamesdixon on 1 Oct 2019

👀12

@jamesdixon just reading this now while refactoring some of our knex implementations on lambda. Any updates on this? Has context.callbackWaitsForEmptyEventLoop = false stopped working?