Knex: Adding upsert type capability

Created on 30 Aug 2013  ·  54Comments  ·  Source: knex/knex

Brought up by @adamscybot in tgriesser/bookshelf#55 - this could be a nice feature to add.

feature request

Most helpful comment

@NicolajKN You shouldn't use toString(), it might cause many kind of problems and won't pass values through bindings to DB (potential SQL injection security hole).

Same this done properly would be like this:

const query = knex('account').insert(accounts);
const safeQuery = knex.raw('? ON CONFLICT DO NOTHING', [query]);

All 54 comments

I agree, this _would_ be a nice feature!

:+1:

:+1:

I'm importing some data from a CSV and there's a good chance that a few of the records overlap from the last import (i.e. last time was imported from Jan 1st to May 31st, this time importing from May 31st to Jun 18th).

Fortunately the third party system assigns reliably unique ids.

What's the best way to go about inserting the new records and updating the old?

I haven't tried it yet, but I was thinking that it would be something like this:

var ids = records.map(function (json) { return json.id })
  ;

Records.forge(ids).fetchAll().then(function () {
  records.forEach(function (record) {
    // now the existing records are loaded in the collection ?
    Object.keys(record).forEach(function (key) {
      Records.forge(record.id).set(key, record[key]);
    });
  });
  Records.invokeThen('save').then(function () {
    console.log('Records have been either inserted or updated');
  });
});

Also, sometimes the thing I'm storing is stored by a determinate id value, such as a hash. In those cases I just want to add or replace the data.

I don't always use SQL as traditional SQL. Often I use it as hybrid NoSQL with the benefit of clear relationship mapping and indexes.

:+1:

Hi,
there are news about this new feature?

Or can someone recommend examples, that show how to simulate this functionality for mysql?

thx

Right now I'm doing it with raw, but I'm working hard on getting this available in here soon.

Postgres just implemented upsert support by the way :+1:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

https://news.ycombinator.com/item?id=9509870

Syntax is INSERT ... ON CONFLICT DO UPDATE

I was looking for a way to do a REPLACE INTO in MySql and found this feature request. Since REPLACE and INSERT have exactly the same syntax in MySql I would imagine it to be easier to implement than a ON DUPLICATE KEY UPDATE. Are there any plans to implement a REPLACE ? Would a PR be something of value?

Any updates on this, especially with PostreSQL 9.5?

I think one important question is whether or not to expose the same upsert method signature for different dialects, such as PostgreSQL and MySQL. In Sequelize, a issue has been raised regarding the return value of upsert: https://github.com/sequelize/sequelize/issues/3354.

I realize that some of KnexJS library methods have distinctions regarding the return values in the context of different dialects (such as insert, where an array of the first inserted id is returned for Sqlite and MySQL, while an array of all the inserted id's is returned with PostgreSQL).

According to the documentation, the INSERT ... ON DUPLICATE KEY UPDATE syntax in MySQL has the following behaviour (http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html):

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

While in PostgreSQL (http://www.postgresql.org/docs/9.5/static/sql-insert.html):

On successful completion, an INSERT command returns a command tag of the form

INSERT oid count

The count is the number of rows inserted or updated. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. The single row must have been inserted rather than updated. Otherwise oid is zero.

If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) inserted or updated by the command.

In this case, the return values can be changed with RETURNING clause.

Thoughts?

I monkey patched Client_PG to add the "onConflict" method for insert. Suppose we want to upsert github oauth credentials, we can write the query like this:

const profile = {
    access_token: "blah blah",
    username: "foobar",
    // ... etc
  }

  const oauth = {
    uid: "13344398",
    provider: "github",
    created_at: new Date(),
    updated_at: new Date(),
    info: profile,
  };

  // todo: add a "timestamp" method

const insert = knex("oauths").insert(oauth).onConflict(["provider", "uid"],{
  info: profile,
  updated_at: new Date(),
});

console.log(insert.toString())

The array of column names specifies the uniquessness constraint.

insert into "authentications" ("created_at", "info", "provider", "uid", "updated_at") values ('2016-02-14T14:42:18.342+08:00', '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', 'github', '13344398', '2016-02-14T14:42:18.342+08:00') on conflict ("provider", "uid")  do update set "info" = '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', "updated_at" = '2016-02-14T14:42:18.343+08:00'

See gist: https://gist.github.com/hayeah/1c8d642df5cfeabc2a5b for the monkey patch.

This is a super hacky experiment... so don't exactly copy & paste the monkey patch into your production code : p

Known problems:

  • The monkey patch is on QueryBuilder affects all dialects, because Client_PG doesn't specialize the builder.
  • Doesn't support raw update like count = count + 1
  • onConflict should probably throw if query method is not insert.

Feedback?

@hayeah I like your approach and it suits Postgres. I am going to try your monkey patch in a project to see if I can empirically detect any issues other than the ones you pointed out.

Syntax Suggestion: knex('table').upsert(['col1','col2']).insert({...}).update({...}); where upsert would take in the condition statement. This way it's not db specific.

Summary of the different implementations of upserts can be found at https://en.wikipedia.org/wiki/Merge_(SQL)

I'm interested in having this capability too. Use case: building a system that is reliant on lots of outside data from an outside service; I periodically poll it for data that I save to a local MySQL db. Will probably be using knex.raw for now.

Also interested, but in my use case would need to have it work in a way that isn't based on conflicts, as the columns don't always have 'unique' constraints - simply update entries matching the query if they exist, otherwise insert new rows.

@haywirez I am curious as to why there are no unique constraints? Wouldn't you be exposed to race conditions?

@hayeah I have a specific use case with time-windowed data, storing entries that have a value tied to a given day. Therefore I'm inserting and updating entries that have a "combined key" of a matching (day) timestamp, and two other IDs corresponding to PK's in other tables. Within a 24-hour window, I have to either insert them, or update them with the latest counts.

This would be a great feature to have!

Hi everyone who's ever commented here. I'm adding a PR Please label.

Happy to take a PR adding this functionality, but I'd like to see a discussion of the desired API here first.

PS.

^ Agreed.

I'm going to delete comments like this, if you want to add a +1 do so with the little emoji reaction thing.

I have a bit of an issue with the array of column restraints as in @willfarrell and @hayeah's examples. Not sure if these examples can support json properties. Is there a reason none of these proposals don't include where statements / proper "queries" to match the record?

proposal 1

knex('table')
  .where('id', '=', data.id)
  .upsert(data)

proposal 2

knex('table')
  .upsertQuery(knex => {
    return knex('table')
      .where('id', '=', data.id)
  })
  .upsertUpdate(knex => {
    return knex('table')
      .insert(data)
  })

proposal 3

knex('table')
  .where('id', '=', data.id)
  .insert(data)
  .upsert() // or .onConflictDoUpdate()

I'm leaning most toward something like 3.

Just to add in here's how mongodb does it.

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

@reggi I believe my monkey patch is compatible with where...

@reggi I don't see your point.
Can you elaborate more on which functionality is missing from the approach proposed in @willfarrell and @hayeah's examples.
Why do you need where at all?
It's just an insert operation.

@reggi The MongoDB example you provided reads "First try to UPDATE WHERE ... then do an INSERT if no document matches the query" whereas SQL UPSERT reads "INSERT INTO ... UPDATING in case a row with this primary key already exists".
So, I guess, you're talking about a whole different "upsert" than it's implemented in SQL databases.

I would propose this API:

knex.createTable('test')
   .bigserial('id')
   .varchar('unique').notNull().unique()
   .varchar('whatever')

knex.table('test').insert(object, { upsert: ['unique'] })

.insert() function would analyse the second parameter.
If it's a string, then it's the old returning parameter.
If it's an object, then it's an options parameter having options.returning and options.upsert, where options.upsert is a list of the unique keys (can be > 1 in case of a compound unique key constraint).
Then a SQL query is generated which simply excludes the primary key and all options.upsert keys from the object (via clone(object) && delete cloned_object.id && delete cloned_object.unique) and then uses that cloned_object stripped of the primary (and unique) keys to construct the SET clause in the second part of the SQL query: ... ON CONFLICT DO UPDATE SET [iterate cloned_object].

I guess that would be the most simple and unambiguous solution homogeneous with the present API.

@slavafomin @ScionOfBytes Looks like even the API has not yet been agreed on. That would be the next step and then someone who likes to implement it may do so. So no news.

ps. I started to delete any additional requests for news if there are none to prevent this thread from being filled up with news request spam and other less related messages.

@amir-s I agree, but the subject of this issue is the upsert capability.

IMO, the real problem is not the API, but the uncommon way to do upserts in each database.

MySQL (ON DUPLICATE KEY UPDATE) and PostgreSQL 9.5+ (ON CONFLICT DO UPDATE) support upsert by default.

MSSQL and Oracle can support it with a merge clause, but knex should know the names of conflict columns to be able to construct the query.

-- in this case the conflict column is 'a'
merge into target
using (values (?)) as t(a)
on (t.a = target.a)
when matched then
  update set b = ?
when not matched then
  insert (a, b) values (?, ?);

But SQLite did not. We need two queries to simulate the upsert

-- 'a' is the conflict column
insert or ignore into target (a, b) values (?, ?);
update target set b = ?2 where changes() = 0 and a = ?1;

Or using INSERT OR REPLACE, aka REPLACE

-- replace will delete the matched row then add a new one with the given data
replace into target (a, b) values (?, ?);

Unfortunately, if the target table has more columns than a and b, their values will be replaced by defaults

insert or replace into target (a, b, c) values (?, ?, (select c from target where a = ?1))

Another solution using CTE, look this stackoverflow answer

I've come to this issue several times in search of a knex-based Postgres upsert. If anyone else needs this, here's how to do it. I've tested this against both single and composite unique keys.

The Setup

Create a unique key constraint on the table using the below. I needed a composite key constraint:

table.unique(['a', 'b'])

The Function

(edit: updated to use raw parameter bindings)

const upsert = (params)=> {
  const {table, object, constraint} = params;
  const insert = knex(table).insert(object);
  const update = knex.queryBuilder().update(object);
  return knex.raw(`? ON CONFLICT ${constraint} DO ? returning *`, [insert, update]).get('rows').get(0);
};

Usage

const objToUpsert = {a:1, b:2, c:3}

upsert({
    table: 'test',
    object: objToUpsert,
    constraint: '(a, b)',
})

If your constraint isn't composite then, naturally, that one line would just be constraint: '(a)'.

This will return either the updated object or the inserted object.

A note about composite nullable indices

If you have a composite index (a,b) and b is nullable, then values (1, NULL) and (1, NULL) are considered mutually unique by Postgres (I don't get it either). If this is your use case, you'll need make a partial unique index and then test for null before upsert to determine which constraint to use. Here's how to make the partial unique index: CREATE UNIQUE INDEX unique_index_name ON table (a) WHERE b IS NULL. If your test determines that b is null, then you'll need to use this constraint in your upsert: constraint: '(a) WHERE b IS NULL'. If a is also nullable, I would guess you'd need 3 unique indices and 4 if/else branches (though this is not my use case, so I'm not sure).

Here's the compiled javascript.

Hope someone finds this useful. @elhigu Any comment on the usage of knex().update(object)? (edit: nevermind - saw the warning - using knex.queryBuilder() now)

@timhuff looks nice, one thing to change would be to pass each query to raw, using value binding. Otherwise query.toString() is used to render each part of the query and it opens up possible dependency injection hole (queryBuilder.toString() is not as safe as passing parameters to driver as bindings)..

@elhigu Wait... query.toString() doesn't use bindings? Could you give me a rough example of the modification you're recommending? I... might have a lot of code to update.

Found the part of the documentation labeled Raw bindings. Updating now I've updated the example. I thought query.toString was safe. It'd be good to have a section of the documentation labeled something like "How to make unsafe queries". There's only a handful of no-nos and that way people can use the library knowing "as long as I don't do these things, I'm safe".

I've created the following upsert: https://gist.github.com/adnanoner/b6c53482243b9d5d5da4e29e109af9bd
It handles single and batch upserts. I adapted it a bit from @plurch . Improvements are always appreciated :)

For what it's worth I've been using this format:

Edit: Updated to be secure for anyone searching for this. Thanks @elhigu

const query = knex( 'account' ).insert( accounts );
const safeQuery = knex.raw( '? ON CONFLICT DO NOTHING', [ query ]);

@NicolajKN You shouldn't use toString(), it might cause many kind of problems and won't pass values through bindings to DB (potential SQL injection security hole).

Same this done properly would be like this:

const query = knex('account').insert(accounts);
const safeQuery = knex.raw('? ON CONFLICT DO NOTHING', [query]);

Deleted discussion of unrelated issue.

@elhigu Hold on doesn't that insert query get executed immediately after being created? Doesn't that create a race condition?

@cloutiertyler You weren't talking to me but maybe I can save @elhigu some time here. None of these queries would be executed. The statement knex('account').insert(accounts) does not execute a query. It's not executed until the data is actually called upon (e.g. via a .then). He sends that into knex.raw('? ON CONFLICT DO NOTHING', [query]) which will call query.toString(), which only converts the query into the SQL statement that would be executed.

@timhuff Thanks Tim, I assumed it had to be something like that, but that's not normal behavior for a promise. Promises are usually executed upon creation. The reason I ask is that I was getting errors saying "Connection Terminated" every so often when I tried to run this upsert. Once I switched over to removing the insert and creating an entirely raw query they went away. It seems like that would be consistent with a race condition.

knex QueryBuilders aren't Promises, though. When you start writing a knex query, you stay in "knexland". Everything you do is more or less just configuring a JSON spec of the query that you want to build. If you run .toString, it builds it and outputs it. It doesn't become a (bluebird) Promise until you run one of these on it. You might be interested in using .return if you want to execute the statement immediately.

Ah, I see, well that clears up my confusion. Thanks for the clarification and pointers! My issue must exist elsewhere then.

As an aside, the fact that it doesn't run immediately is often useful. Sometimes you wanna pass the thing around, configuring it, before executing. There's also situations where you can do stuff like...

const medicalBuildings = knex.select('building_id').from('buildings').where({type: 'medical'})
const medicalWorkers = knex.select().from('workers').whereIn('building', medicalBuildings)

(super contrived example but let's run with it)

I don't actually want to run that first statement - it's just part of my 2nd one.

Not to mention that if all query builders would execute on creation the builder pattern queries would trigger before building is done. It would not work at all without having some terminator method (that executes the query).

@elhigu I mean... I guess you could just always run it on the next tick, right? I'm not suggesting that would by any means be a good idea but how many queries are actually created and executed on different ticks?

@timhuff I hadn't thought about that. Yeah I think that would be possible too. I find case quite common where one starts building query, then fetch some async data and keep on building more. I don't do that very often though.

@lukewlms that ’execute()’ -like method is called ’.then()’ you can call always it when you like to execute query and get promise. It is just how ’thenable’ works and it is explained in promise spec. It is one important and widely used concept in javascript when dealing with promises and async/await (which are pretty much just glorified shortcuts for Promise.resolve and .then). Also if you are executing queries without handling results you are looking for problems like app crashing.

Actually its better to just follow this PR about the upsert feature implementation https://github.com/tgriesser/knex/pull/2197 it has already API designed how it should work. In this thread is not really any useful information that is not already mentioned in comments of that PR. If needed (PR is closed and never completed) lets open new issue for this one with additional API description.

@elhigu Thanks for the heads up! I was unaware of that thread. Good to hear we're making progress on an upsert coming to the API. Looks like 6 months ago it failed 1 of the 802 tests and so it never passed travis-ci. Is that 1 failing test case the only thing keeping this from becoming a part of the knex API?

@timhuff there was only initial implementation done, it must be completely rewritten. Most important part of that PR is the common API design, which can be supported by most of the dialects. So the feature comes when someone just decides to implement that API. If no-one else does that and some day I have some extra time or need it badly, I'll do it myself. That is one of the most important feature I would like knex to get (in addition to joins in updates).

@elhigu Thanks for filling me in. I'll have to read up on the progress here when I get a little more time.

I'm not sure if this helps anyone or if i'm just a noob, but for the solution from @timhuff I had to wrap my constraint in quotes because i was getting a query syntax error.

const contraint = '("a", "b")'

I've come to this issue several times in search of a knex-based Postgres upsert. If anyone else needs this, here's how to do it. I've tested this against both single and composite unique keys.

The Setup

Create a unique key constraint on the table using the below. I needed a composite key constraint:

table.unique(['a', 'b'])

The Function

(edit: updated to use raw parameter bindings)

const upsert = (params)=> {
  const {table, object, constraint} = params;
  const insert = knex(table).insert(object);
  const update = knex.queryBuilder().update(object);
  return knex.raw(`? ON CONFLICT ${constraint} DO ? returning *`, [insert, update]).get('rows').get(0);
};

Usage

const objToUpsert = {a:1, b:2, c:3}

upsert({
  table: 'test',
  object: objToUpsert,
  constraint: '(a, b)',
})

If your constraint isn't composite then, naturally, that one line would just be constraint: '(a)'.

This will return either the updated object or the inserted object.

A note about composite nullable indices

If you have a composite index (a,b) and b is nullable, then values (1, NULL) and (1, NULL) are considered mutually unique by Postgres (I don't get it either). If this is your use case, you'll need make a partial unique index and then test for null before upsert to determine which constraint to use. Here's how to make the partial unique index: CREATE UNIQUE INDEX unique_index_name ON table (a) WHERE b IS NULL. If your test determines that b is null, then you'll need to use this constraint in your upsert: constraint: '(a) WHERE b IS NULL'. If a is also nullable, I would guess you'd need 3 unique indices and 4 if/else branches (though this is not my use case, so I'm not sure).

Here's the compiled javascript.

Hope someone finds this useful. @elhigu ~Any comment on the usage of knex().update(object)?~ (edit: nevermind - saw the warning - using knex.queryBuilder() now)

Removed some unrelated discussions (about how promises/thenables work).

Did this get added ?

No. There is feature request and spec in https://github.com/knex/knex/issues/3186

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  ·  3Comments

tjwebb picture tjwebb  ·  3Comments

hyperh picture hyperh  ·  3Comments

zettam picture zettam  ·  3Comments

koskimas picture koskimas  ·  3Comments