Knex: How do I use Knex with AWS Lambda?

Created on 20 Jan 2017  ·  34Comments  ·  Source: knex/knex

I'm running into issues with connection pooling while testing some code. I'm expecting my lambda function to be called maybe several thousand times over a couple of seconds, and having trouble figuring out the best way to connect to my db. Here's a very similar issue for node/postgres: essentially the problem is that I need to be able to get a connection from the pool if one is available, however I can't depend on the pool existing because of how AWS (unreliably) reuses lambda containers.

Basically what I'm looking for is a way to reliably get or create a connection to my db. I have not been able to find any examples (like while(!availableConnections) { tryToGetConnection() }. Do I need to interact with node-pool? How can I do this with Knex?

insightful question

Most helpful comment

Sorry I committed in a middle of sentence, my kid just threw like 3 liters of water to floor :1st_place_medal: I'll update the comment above in a moment...

All 34 comments

Knex pooling works only if connections are made from the same node process.

If AWS lambda instances are not running shared node process you just have to create new pool with min / max connections 1 in each lambda instance and hope that your database has good enough settings to allow hundreds of simultaneous connections (in RDS it depends on instance size).

After reading this https://forums.aws.amazon.com/thread.jspa?threadID=216000

Looks like lambda really is sharing some processes, so if you can figure out which is maximum count of lambda containers, that are created you should be able to calculate optimal pool size (each container has their separate pool so total connections to DB is pool.max * max container count).

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

If you are getting error from DB which says that maximum connection count has been reached, then you need to make max pool size smaller.

Sorry I committed in a middle of sentence, my kid just threw like 3 liters of water to floor :1st_place_medal: I'll update the comment above in a moment...

Thanks for the reply. It sounds like estimating the maximum pool size will be my best bet, even though the number of connections will vary drastically over time.

To be clear, there is no way to close a connection in Knex, correct? Only the ability to destroy a connection pool? The fact that Lambda will sometimes reuse containers kind of throws everything off.

Destroying connection pool destroys also all the connections (gracefully waiting that they complete first) and I suppose that when lambda destroys container, all its open TCP sockets will close implicitly when process dies.

I don't see why one should try to close connections explicitly after each request since it would destroy the benefit of pooling. You would get the same effect by creating pool with size 1 and destroying it afterwards.

You can also configure idle timeout for pool which will automatically close connection if it is not used and is just waiting for action in pool.

Can I use Knex to send a COPY query to the RedShift cluster, and not wait for the results?

Doing this with pg Pool terminates the query as soon as the end of the Lambda function is reached.

@BardiaAfshin If lambda container is destroyed and all its sockets are freed when lambda function end is reached, in that case also db query will die and might not be finished.

I'm not sure either how postgresql react to client-side connection ending if COPY query will be rolled back due to not finished implicit transaction before reading the result values...

Anyways if the query can be sent or not that way is not up to knex, but it depends on how aws lambda and postgresql work.

My observation is that the query is killed on RedShift and it is rolled back.

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

I'm running a db load testing script and this doesn't seem to be true. Anything higher than like 30 simultaneous connections immediately times out, rather than waiting for an open connection.

Is there a way to manually release the currently used connection once a query is done?

Does anyone have example code they could share for those of us just getting into AWS Lambda? I am hoping that someone could share patterns and/or anti-patterns of knex/postgres/lambda.

Here is what I am using now - I am certain it can be improved, but hoping for some little bit of vindication as to whether or not I am on the correct path...

'use strict';
var pg = require('pg');

function initKnex(){
  return require('knex')({
      client: 'pg',
      connection: { ...details... }
  });
}

module.exports.hello = (event, context) =>
{
  var knex = initKnex();

  // Should I be returning knex here or in the final catch?
  knex
  .select('*')
  .from('my_table')
  .then(function (rows) {
    context.succeed('Succeeded: ' + JSON.stringify(rows || []));
  })
  .catch(function (error) {
    context.fail(error);
  })
  .then(function(){
    // is destroy overkill? - is there an option for knex.client.release, etc?
    knex.destroy();
  })
}

I'm in the same boat--still have not figured out what the best way is despite lots of googling around and testing. What I'm doing right now is:

const dbConfig = require('./db');
const knex = require('knex')(dbConfig);

exports.handler = function (event, context, callback) {
...
connection = {..., pool: { min: 1, max: 1 },

This way the connection (max 1 per container) will stay alive so the container can be reused easily. I don't destroy my connection at the end.

http://blog.rowanudell.com/database-connections-in-lambda/

Not sure if this is the best way but it's worked for me so far.

/shrug

@austingayler

I am not exactly sure what happens when const knex is declared - is this where the initial connection is setup? Can someone clarify? (I am assuming you have connection info in your dbConfig)

In your code, isn't the connection itself being overwritten and re-created everytime the handler is called?

Just chiming in a someone who's in the same boat, I've not had much luck trying to figure out containers vs pool size (as per @elhigu's suggestion). My solution has been to destroy the pool after every connection (I know it's not optimal 😒):

const knex = require('knex');

const client = knex(dbConfig);

client(tableName).select('*')
  .then((result) => { 

    return Promise.all([
      result,
      client.destroy(),
    ])  
  })
  .then(([ result ]) => {

    return result;
  });

TL;DR: Simply set context.callbackWaitsForEmptyEventLoop = false before calling callback.

AWS Lambda waits for empty event loop (by default). so function could throw Timeout error even callback executed.

Please see below links for details:
https://github.com/apex/apex/commit/1fe6e91a46e76c2d5c77877be9ce0c206e9ef9fb

To @elhigu @tgriesser : This is not an knex issue. This is Definitely issue of Lambda environment. I think tag this issue to question and should be good to close :)

@mooyoul yep, definately not knex issue, but maybe documentation issue... though I think I dont want any aws lambda specific stuff to knex docs, so closing.

Please look at this link
https://stackoverflow.com/questions/49347210/why-aws-lambda-keeps-timing-out-when-using-knex-js
You need to close the db connection otherwise Lambda runs until it times out.

Has anyone found a pattern that works flawlessly for using Knex with Lambda?

Do you have any other issues when you close connection properly?

The thing I'm trying to avoid is closing the connection and making it available across Lambda calls.

Are you sure Lambda allows maintaining state between calls?

Check out the node.js part of https://scalegrid.io/blog/how-to-use-mongodb-connection-pooling-on-aws-lambda/ it suggests a solution to hanging up on non-empty event loop.

I don't like bumping the thread like this, but since I think it's getting lost in the comments, @mooyoul's answer above worked great for us.

If you don't turn that flag to false, Lambda waits for the event loop to be empty. If you do, then the function finishes execution as soon as you call the callback.

For me, it worked on my local machine but not after deploying. I was kind of be mislead.

It turns out the RDS inbound source is not open to my Lambda function. Found solution at Stack Overflow: either changing RDS inbound source to 0.0.0.0/0 or use VPC.

After updating RDS inbound source, I can Lambda with Knex successfully.

The Lambda runtime I am using is Node.js 8.10 with packages:

knex: 0.17.0
pg: 7.11.0

The code below using async also just works

const Knex = require('knex');

const pg = Knex({ ... });

module.exports. submitForm = async (event) => {
  const {
    fields,
  } = event['body-json'] || {};

  return pg('surveys')
    .insert(fields)
    .then(() => {
      return {
        status: 200
      };
    })
    .catch(err => {
      return {
        status: 500
      };
    });
};

Hopefully it will help people who might meet same issue in future.

Something I'd like to draw people's attention to is the serverless-mysql package, which wraps the standard mysql driver but handles a lot of the lambda-specific pain points around connection pool management that are described in this thread.

However I don't think Knex will work with serverless-mysql at this time (according to this issue) since there's no way to swap in a different driver. There could also be incompatibilities since serverless-mysql uses promises instead of callbacks.

The best approach is probably to add a new client implementation in Knex. I'd be happy to give this a shot, but would love someone more familiar with Knex to tell me if they think it's reasonable/doable?

Also, was thinking in the meantime, I could use Knex to _build_ but _not execute_ MySQL queries. So just call toSQL() and pass the output to serverless-mysql to execute.

What I'm wondering though is if Knex can be configured without any db connections? There's no sense opening a connection that's never used.

Would the following work?

connection = {..., pool: { min: 0, max: 0 ) },

@disbelief You can initialize knex without connection. There is an example how to do it in the end of this secion in docs https://knexjs.org/#Installation-client

const knex = require('knex')({client: 'mysql'});

const generatedQuery = knex('table').where('id',1).toSQL().toNative();

@elhigu ah cool, thanks. Will give that a shot in the interim.

Update: the above doesn't appear to work. Knex throws an error if it can't establish a db connection, even if there's never any call to execute a query.

@disbelief did you find a solution for that ?

@fdecampredon nope. At the moment I'm simply building queries with squel, calling toString() on them, and passing them to the serverless-mysql client.

@disbelief It's surprising to me that building the queries failed without a db connection.

Would you mind posting the code & config you're using to build the knex client? Also the knex version.

The following works fine for me with
Node v8.11.1
mysql: 2.13.0
knex: 0.18.3

const k = require('knex')

const client = k({ client: 'mysql' })

console.log('Knex version:', require('knex/package.json').version)
// => 0.18.3
console.log('sql:', client('table').where('id',1).toSQL().toNative())
// => { sql: 'select * from `table` where `id` = ?', bindings: [ 1 ] }

As per how Lambda's resources recycling works, Knex instance should always be kept outside any function or class. Also, it's important to have max 1 in connection in the pool.

Something like:

const Knex = require('knex');
let instance = null;

module.exports = class DatabaseManager {
  constructor({ host, user, password, database, port = 3306, client = 'mysql', pool = { min: 1, max: 1 }}) {
    this._client = client;
    this._poolOptions = pool;
    this._connectionOptions = {
      host: DB_HOST || host,
      port: DB_PORT || port,
      user: DB_USER || user,
      password: DB_PASSWORD || password,
      database: DB_NAME || database,
    };
  }

  init() {
    if (instance !== null) {
      return;
    }

    instance = Knex({
      client: this._client,
      pool: this._poolOptions,
      connection: this._connectionOptions,
      debug: process.env.DEBUG_DB == true,
      asyncStackTraces: process.env.DEBUG_DB == true,
    });
  }

  get instance() {
    return instance;
  }
}

In this way, you'll take the most out of Lambda's recycling, i.e. each activated (freezed) container will hold only the same connection.

As a side note, keep in mind that Lambda scales out by default if you don't limit the number of concurrent containers.

I've had Knex running in Lambda for almost a year now with no problems. I'm declaring my Knex instance outside of my Lambda function and utilizing context.callbackWaitsForEmptyEventLoop = false as mentioned in other posts.

That said, over the past day, something seems to have changed on the Lambda side as I'm now seeing a huge connection spike in Postgres; connections don't seem to be closed.

Has anyone else using the aforementioned approach seen any chances over the past day or so?

@jamesdixon just reading this now while refactoring some of our knex implementations on lambda. Any updates on this? Has context.callbackWaitsForEmptyEventLoop = false stopped working?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

legomind picture legomind  ·  3Comments

saurabhghewari picture saurabhghewari  ·  3Comments

mtom55 picture mtom55  ·  3Comments

sandrocsimas picture sandrocsimas  ·  3Comments

tjwebb picture tjwebb  ·  3Comments