Elasticsearch: Update API: update by query

Created on 12 Jan 2012  ·  160Comments  ·  Source: elastic/elasticsearch

1583 allows to update individual documents. Update by query will reduce the network roundtrips radically if you want to update a number of documents and push work from the client to ES.

curl -XPOST localhost:9200/index/type/_update -d '{
    "query" : { "constant_score" : { "filter" : { "term" : { "counter" : 0 } } } },
    "script" : "ctx._source.counter += count",
    "params" : {
        "count" : 4
    }
}'
:DistributeCRUD

Most helpful comment

Update by query is live in 2.3.0 and 5.0.0-alpha-1. The docs are here.

All 160 comments

Would really love this feature too!

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

I really need this feature

:+1:

While waiting this feature to be officially finished and released, I've packaged the pull request #2231 as a plugin: yakaz/elasticsearch-action-updatebyquery.
Have fun.

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

:+1: :pray:

+1

+1

Is there a way to pass the score of the query as a parameter to the update script? I need to update entries with scores updated based on the fields of its children.

+1

+1

+1

@scottc52 Did you manage to do it? I am also looking for a way to do this.

+1

@gboivin Nope. I'm doing a has_child query and sending a seperate update request, but it's slow.

waiting this feature too..

+1

+1

+1

+1

+1

+1

+1

Just wrote a little script to help wait for something... more "production ready" ;-)

https://github.com/YannBrrd/esNodeUpdater

Feel free to comment/update...

+1

Is there an official status on this feature from the dev team? I don't see any input from them. Are there plans to add this feature to the core or is the preference to have users use a plugin like the one listed above?

We plan to get back on this one, the main reason we put this on hold is that we need to have a way to stop existing update by queries, as they can be execute by mistake on a large amount of data, causing problems...

+1. Thanks for the update and working on this.

+1

+1

+1

+1

+1, sounds useful

+1

+1

+1

+1

+1

+1

+1

+1

+2

+1

+10

+1

+1

+1

Have you ever think to implement with a double HTTP call this feature. I think about warmers which give the possibility to store the query and then execute the query (it's not really the same thing but it make me think about).

@kimchy you tell that you think a way to stop the update if it was launched on a big amount of data by mistake. If you stop it, maybe indexed data will be in invalid state (maybe it is possible to rollback...?). Maybe a better approach will be prevent mistake.

If you require two HTTP calls before trigger the real mass update (1 to prepare and 1 to really trigger it with a transation id between) and then an update status handler (like the dataimporthandler in SolR) to know when the query is really done.

I'm not sure to be really clear but I think it can be a solution to prevent mistake calls...

+1

+1

I'd also like to upvote this.

+1

@kimchy: Perfomance can't 'be the question: Currently I'm running thousands of queries to lookup data (e.g. OSM index address lookup for GPS locations - lookups are fast, hey I got ElasticSearch!) and update each document in other index (e.g. to add clear text address). My updates add new fields. A bulk update inside ES must be more efficient than 10.000 Lookup queries + 10.000 update requests (also using bulk updates ...). From coding and runtime point of view it would be more efficient, e.g. the bulk update file get 20.000 lines and could have only 2 with the new feature - all data moved over the network and making ES busy reading bulk update files ...

Maybe you agree to add limits to update operation e.g. _update/_query=some_conditions&size=1000 in that way it avoids to update a million docs - and we as developer can decide if we run 1000*1000 updates to update a million records... It should return number of docs updated to give some control if another update call is required.

+1

For my scenario (enrich records after lookups in other indicies) I might do it another way: insert data first to mongoDb, do lookups in ElasticSearch update records in Mongo, use mongo river to get final results in ElasticSearch to show it in GUI (build on top of ES). Has anybody experienced with such scenarios? I hoped I could go ES only way ... until now, I did reject using a DB in my project.

Hi,

you could simply use Couchbase + Elasticsearch for this, as Couchbase
offers an interface with Elasticsearch

Cordialement,
Yann Barraud

2014-02-03 seti123 [email protected]:

For my scenario (enrich records after lookups in other indicies) I might
do it another way: insert data first to mongoDb, do lookups in
ElasticSearch update records in Mongo, use mongo river to get final results
in ElasticSearch to show it in GUI (build on top of ES). Has anybody
experienced with such scenarios? I hoped I could go ES only way ... until
now, I did reject using a DB in my project.

Reply to this email directly or view it on GitHubhttps://github.com/elasticsearch/elasticsearch/issues/1607#issuecomment-33917801
.

+1

+100

+1

+1

Is there an alternative in ElasticSearch e.g. trigger a script that does an action when new data is inserted or updated? Some kind of before Index-Trigger could help me remove the pre-processing chain (we did now Message Ques with REDIS and 0MQ processing chain before we insert Data in ES - all of it costs network bandwdtih to shuffle data for parallel precessing ...)

I would like to see
http://localhost:9200/index/type/_preprocessBeforeIndex?script=myDataAnalysisScript
http://localhost:9200/index/type/_preprocessBeforeUpdate?script=myDataAnalysisScript
The Script mus be able to add new fields to the current record before ES stores/index it (to avoid double index action after changes). As we work a lot with node.js the scripts should work in the language required (in our Case JavaScript).

Even better if we could define the Script in the MAPPING per Type of data instead on a generated indicies.
Any plug-in avalable that is able to trigger such scripts? Any documentation of using ES API in Scripts?

+1

+1

+1

+1

+1

+1

+1

+1

+1

Waiting for this feature... (+1)

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

Is this feature under development at all?
This would solve so many problems that are almost impossible to handle reliably on the application level right now.

+1

+1

+1

Just to remind you that since mid February 2013 I've packaged, and maintained ever since, the "official pull request" #2231 via @martijnvg's branch as a plugin: yakaz/elasticsearch-action-updatebyquery.

+1

+1

+1

+1

+1
How it is possible to have this feature since February 2013 still not merged to master?

+1
Ditto on @KrzysztofWilczek comment. Why has the PR been left to stagnate over the past year with no updates? This is by far the most commented on issue.

+1

We got this issue several month ago (see my posts as @seti123 January/February ) and I would like to share our results - after giving up on DB+ES River (too much worries about version dependencies) we evaluated our use case sucessfully with Crate Data (which uses ES as library and adds a SQL interface for mapping & query including "update by query" https://crate.io/docs/stable/sql/dml.html#updating-data ).
A good starting point to read about similarites & differences: https://crate.io/blog/crate_data_elasticsearch

Closed in favour of #2230

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

will update by query support setPostFilter?
issue # 12295

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

can some one review this and give feedback.
https://discuss.elastic.co/t/updatebyqueryresponse-throwing-timeout/29176

Update by query fails while update more then 20 + million record .

@Praveen82 you are using a 3rd party plugin. This isn't the right place to be requesting support, you should post that as an issue on that plugin's repository.

https://github.com/elastic/elasticsearch/pull/15125 is implementing a syntax that will look a little like

curl -XPOST localhost:9200/index/type/_update_by_query -d '{
    "query" : { "term" : { "counter" : 0 } },
    "script" : {
      "inline": "ctx._source.counter += count",
      "params" : {
          "count" : 4
      }
  }
}'

The reason this was stalled for so long is because of those timeouts: up until now there has been a way to launch long running job in Elasticsearch and report on their status and things. With the task management api (#15347) eminent I picked up the torch on "reindex" and "update-by-query" style things and started them again with the intent to integrate with task management ASAP.

Anyway, #15125 and any followup PRs are the place to look for this feature.

+1

+1

+1

+1

+1

Update by query is live in 2.3.0 and 5.0.0-alpha-1. The docs are here.

Does update by query in 2.3.+ or 5.+ support the javascript plugin?

Does update by query in 2.3.+ or 5.+ support the javascript plugin?

If you really want it, sure. In 2.3+ we test update-by-query against groovy and in 5.+ we test against painless. We used to test against groovy and it worked there as well. I expect javascript will work fine.

JS support would be pretty slick.

JS support would be pretty slick.

As I said, it exists, you just have to install the plugin.

The trouble with all of these languages is that their implementation on the JVM aren't properly oriented for embedding. That is why we don't include it by default.

Anyway, if you want to talk more about it I think discuss.elastic.co is a more appropriate place for it.

Was this page helpful?
0 / 5 - 0 ratings