Elasticsearch: Date type has not enough precision for the logging use case.

Created on 5 Mar 2015  ·  69Comments  ·  Source: elastic/elasticsearch

At present, the 'date' type is millisecond precision. For many log use cases, higher precision time is valuable - microsecond, nanosecond, etc.

The biggest impact of this is during sorting of search results. If you sort chronologically, newest-first, by a date field, documents with the same date will probably be sorted incorrectly (because they match). This is often reported by users seeing events "out of order" when they have the same timestamp. Specific example being sorting by date and seeing events in newest-first order, unless there is a tie, in which case oldest-first (or first-written?) appears. This causes a bit of confusion for the ELK use case.

Related: https://github.com/logstash-plugins/logstash-filter-date/pull/8

I don't have any firm proposals, but I have two different implementation ideas:

  • Proposal 1, use a separate field: Store our own custom-precision time in a separate field as a _long_. This allows us to do correct sorting (because we have higher precision), but it makes any date-related functionality in Elasticsearch not usable (searching now-1h or doing date_histogram, etc)
  • Proposal 2, date type has tunable precision: Have the date type have configurable precision, with the default (backwards compatible) precision being milliseconds. This would let us choose, for example, nanosecond precision for the logging use case, and year precision for an archaeological use case (billions of years ago, or something). Benefit here is date histogram and other date-related features could still work. Further, having the precision configurable would allow us to keep the underlying data structure a 64bit long and users could choose their most appropriate precision.
:SearcMapping >feature high hanging fruit stalled

Most helpful comment

Elasticsearch 7.0 will include a date_nanos field type that handles nanoseconds sorting precision:
https://github.com/elastic/elasticsearch/pull/37755
Nanoseconds precision field is now a first class citizen that doesn't require two fields to retain precision so I will close this issue, please open new ones if you find bugs or enhancements to make on this new field type.

All 69 comments

I know Joda's got a precision limit (the Instant class is millisecond precision) and a year limit ("year must be in the range [-292275054,292278993]"). I'm open to helping explore solutions in this area.

What about consequences to field_date size? even with docvalues in place, cardinality will be ridiculously high. Even for those scenarios which need this, this could be an overkill, no?

Couldn't you just store the decimal part of the second in a secondary field (as a float or long) and sort by these two fields when needed? You could still aggregate based on the standard date field but not at a microsecond resolution.

I've been speaking to a few networking firms lately and it's dawned on me that microsecond level is going to be critical for IDS/network analytics.

that last request is mine I believe. I would add that to monitor networking (and other) activities in our field, nanosecond support is paramount.

:+1: for this feature

What about switching from Joda Time to date4j? It supports higher-precision timestamps compared to Joda and supposedly the performance is better as well.

Before looking on the technical side, is BSD License compatible with Apache2 license?

So BSD is compatible with Apache2.

I'd like to hear @jpountz's thoughts on this comment https://github.com/elastic/elasticsearch/issues/10005#issuecomment-77479544 about high cardinality with regards to index size and performance.

I could imagine adding a precision parameter to date fields which defaults to ms, but also accepts s, us, ns.

We would need to move away from Joda, but I wouldn't be in favour of replacing Joda with a different dependency. Instead, we have this issue discussing replacing Joda with Java.time https://github.com/elastic/elasticsearch/issues/12829

@clintongormley It's hard to predict because it depends so much on the data so I ran an experiment for an application that ingests 1M messages at a 2000 messages per second per shard rate.

| Precision | Terms dict (kB) | Doc values (kB) |
| --- | --- | --- |
| milliseconds | 3348 | 2448 |
| microseconds | 10424 | 3912 |

Millisecond precision is much more space-efficient, in particular because with 2k docs per second, several messages are in the same millisecond, but even if we go with 1M messages at a rate of 200 messages per second so that sharing the same millisecond is much more unlikely, there are still significant differences between millisecond and microsecond precision.

| Precision | Terms dict (kB) | Doc values (kB) |
| --- | --- | --- |
| milliseconds | 7604 | 2936 |
| microseconds | 10680 | 4888 |

That said, these numbers are for a single field, the overall difference would be much lower if you include _source storage, indexes and doc values for other fields, etc.

Regarding performance, it should be pretty similar.

From what users have told me, by far the most important reason for storing microseconds is for the sorting of results. it make no sense to aggregate on buckets smaller than a millisecond.

This can be achieved very efficiently with the two-field approach: one for the date (in milliseconds) and one for the microseconds. The microseconds field would not need to be indexed (unless you really need to run a range query with finer precision than one millisecond), so all that would be required is doc_values. Microseconds can have a maximum of 1,000 values, so doc_values for this field would require just 12 bits per document.

For the above example, that would be only an extra 11kB.

A logstash filter could make adding the separate microsecond field easy.

Meh - there aren't 1000 bits in a byte. /me hangs his head in shame.

It would require 1,500kB

If we want to use the ELK framework proper analyzing network latency we really need nanosecond resolution. Are there any firm plans/roadmap to change the timestamps?

Let's say I index the following JSON document with nanosecond precision timestamps:

{ "@timestamp": "2015-09-30T12:30:42.123456789-07:00", "message": "time is running out" }

So the internal date representation will be, 2015-09-30T19:30:42.123 UTC, right?

But if I issue a query matching that document, and ask for either the _source document or the @timestamp field explicitly, won't I get back the original string? If so, then in cases where the original time string lexicographically sorts the same as the converted time value, would that be sufficient for a client to further sort to get what they need?

Or is there a requirement that internal date manipulations in ES need such nanosecond precision? I am imagining that if one has records with nanosecond precision, only being able to query for a date range with millisecond precision could potentially result in more document matches than wanted. Is that the major concern?

I think the latter, internal date manipulations need probably nanosecond precision. Reason is that when monitoring latency on 10Gb networks we get pcap records (or packets directly from the switch via UDP) which include multiple fields with nanosecond timestamps in the record. We like to find out the difference between the different timestamps in order to optimize our network/software and find correlations. In order to do this we like to zoom in on every single record and not aggregate records.

:+1: for solving this. It is causing major issues for us now in our logging infrastructure.

@pfennema @abierbaum What problems are you having that can't be solved with the two-field solution?

What we like to have is that we have a timescale in the display (Kibana) where we can zoom in on the individual measurements which have a timestamp with nanosecond resolution. A record in our case has multiple fields (NICTimestamp, TransactionTimestamp, etc) which we like to correlate with each other on an individual basis hence not aggregated. We need to see where spikes occur to optimize our environment. If we can have on the x-axis the time in micro/nanosecond resolution we should be able to zoom in on individual measurements.

@clintongormley Our use case is using ELK to analyze logs from the backend processes in our application. The place we noticed it was postgresql logs. With the current ELK code base, even though the logs coming from the database server have the commands in order, once they end up elastic search and are visualized in kibana the order of items happening on the same millisecond are lost. We can add a secondary sequence number field, but that doesn't work well in Kibana queries (since you can't sort on multiple fields) and causes quite a bit of confusion on the team because they just expect the data in Kibana to be sorted in the same order as it came in from postgresql and logstash.

We have the same problem as @abierbaum described. When events happen on the same millisecond the order of the messages is lost.
Any workaround or suggestion on how to fix this would be really appreciated.

You don't need to increase the timestamp accuracy: instead, the time sorting should be based on both timestamp and ID: message IDs are monotonically increasing, and specifically, they are monotonically increasing for a set of messages with the same timestamp...

@dtr That may be true for IDs that are automatically assigned, but only if the messages are indexed in the correct order in the first place, and only if the application isn't supplying it's own IDs. There's definitely no way that I could depend on that behavior. Also is the monotonically increasing IDs guaranteed, or is it an implementation artifact, especially when considering clusters of more than one node?

I believe the original intent was to see a bulk of log messages originated from the same source. If a cluster is involved, then probably the timestamp is the only clustering item (unless, of course, there is a special "context" or "session" field)
For that purpose, we can rely on the id (assuming, of course, its monotonically increasing at least per source)

@dtr2 that's a lot of "ifs" to be relying on ElasticSearch's autogenerated IDs, nearly all of which are violated in my cluster:

1) Some messages supply their own ID if the source has a unique ID already associated with it (systemd journal messages in my case).
2) All of my data runs through a RabbitMQ server (sometimes passing through multiple queues depending on the amount of processing that needs to be done) with multiple consumers per queue so there's no way that I can expect documents to be indexed in any specific order, much less by the same ElasticSearch node.

In any case, ElasticSearch does not guarantee the behavior of autogenerated IDs. The docs only guarantee that the IDs are unique:

https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

So I can hope that you see that trying to impose an order based upon the ID cannot be relied upon in the general case. Yes, there may be certain specific instances where that would work, but you'd be relying on undocumented implementation details.

@jcollie , In that case, trying to find a "context" is impossible - unless your data source provides it. The idea was to find a context and filter "related" lines together.

@jcollie "IDs" (in fact they are counters per log source) have to be generated before ingestion, outside of elasticsearch.

Instead of sorting on time you sort on time, counter.

I don't get it - what is the resistance to extending timestamps to nanosecond accuracy? I realize that would take a lot of work and would likely be a "3.x" feature, but anything else is just a workaround until nanosecond timestamps are available.

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

As an example, let's say that I implement this counter in each of my log sources (for example as a logstash plugin). Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source. That may be an extreme example but I think that it illustrates the point.

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

@jcollie can you tell me how you keep clocks on your machines in perfect sync so nanosecond accuracy starts making sense? Even Google struggles to do so:

Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source.

You are right here. There is no way. You can easily see jitter of a few ms between ntp sync between machines in the same rack:

19 Jan 09:27:40 ntpdate[11731]: adjust time server 10.36.14.18 offset -0.002199 sec
19 Jan 09:27:50 ntpdate[11828]: adjust time server 10.36.14.18 offset 0.004238 sec

On the other hand, you can reliably say in which order messages were processed by a single source:

  • Single thread of your program.
  • Single queue in logstash.
  • Some other strictly ordered sequence (ex: kafka partition).

I'd be happy to be proven wrong, though.

Clock synchronisation between machines is done by PTP if you need an
accurate synchronisation. Which is needed when measuring low
latency/high frequency trading networks. The PTP source is usually the
switch in the network (Arista, Cisco)

Ivan Babrou schreef op 2016-01-19 10:32:

Having a counter per log source only really helps correlating messages from the same source, but is really not very useful in correlating messages across sources/systems.

@jcollie [1] can you tell me how you keep clocks on your machines in perfect sync so nanosecond accuracy starts making sense? Even Google struggles to do so:

Then let's say that I have one source that generates 1000 messages per millisecond and another source that generates 100000 messages per millisecond. There's no way that I could reliably tell what order those messages should be in relative to each source.

You are right here. There is no way. You can easily see jitter of a few ms between ntp sync between machines in the same rack:

19 Jan 09:27:40 ntpdate[11731]: adjust time server 10.36.14.18 offset -0.002199 sec
19 Jan 09:27:50 ntpdate[11828]: adjust time server 10.36.14.18 offset 0.004238 sec

On the other hand, you can reliably say in which order messages were processed by a single source:

  • Single thread of your program.
  • Single queue in logstash.
  • Some other strictly ordered sequence (ex: kafka partition).

I'd be happy to be proven wrong, though.

Reply to this email directly or view it on GitHub [2].

Links:

[1] https://github.com/jcollie
[2]
https://github.com/elastic/elasticsearch/issues/10005#issuecomment-172790794

I don't get it - what is the resistance to extending timestamps to nanosecond accuracy

@jcollie My understanding of the discussion in this issue is that there is no specific resistance. We have been discussing the costs/benefits of various implementation options. Hopefully this helps clarify things.

It requires replacing Joda time with Java time, which is something we would like to do but it is a big project.

+1 for this feature

Epoch Nano would be great we provide our customer logs in EPOCH nano and we have to sed these down to milli which adds processing time.

We also suffer from low resolution of time fields. Nanoseconds would be great. Looking forward for this feature.

I'm in favor of the higher precision, allow the user to choose the precision they want.

Wonder if this is the same issue with the time field in Kibana losing its time (ns) precision,
"? time 2017-03-23T18:34:15.233983675Z"
When the Index is reset in Kibana the time field loses some of its precision time,
" time March 23rd 2017, 13:35:21.246"

If so...Is there a workaround to display the complete precision?

I've recently encountered this when attempting to log events from an mqtt broker for iot applications. Had to trim down to milliseconds because having fractional part in milliseconds doesn't play well with ES.

This would be nice to have. We encounter a loss of precision when logging from a source generating at the microsecond level. We can get around this by using sequence numbers, but we would prefer not to have the loss of precision. When you're measuring hardware functions you want as precise time as possible.

@jmwilkinson, at that frequency, don't we always need to record the total order since we can get multiple logs at the microsecond level these days?

@portante I'm afraid I didn't quite grok your question.

Do folks believe that by having nanosecond precision in a time value each log entry emitted in sequence is guaranteed to get a different time value recorded for its timestamp? @jmwilkinson?

To be sure we are able to reconstruct the original order of logs from a source we need to record a monotonically increasing sequence for them. We can approximate with a ns precision timestamp, but I don't think we are guaranteed that sufficiently fast log generation will get separate timestamps.

import time

val = 0
same = 0
for i in range(10000):
    prev = val
    val = time.time()
    if prev == val:
        same += 1
print(same)

@portante That's a good point. And I think I was mistaken about of loss of precision- elasticsearch retains the precision, it just isn't used in sorting. Which is fine if we need to use sequence numbers anyway.

@portante Actual timestamps convey much more information than just sequence numbers, can be much easier to generate (eg. across multiple processes) and situations exist where a certain precision (microsecond, nanosecond) gives correct ordering sufficiently often to be useful.

I have a use case where we collect performance events from modules inside a running process and across distributed processes (like Google Dapper): millisecond precision is insufficient to measure differences between closely related events, but we still need some absolute time to relate that to other events. The occasional clock glitch breaks perfect ordering, but it isn't an issue in practice, because it's performance data and we have many samples. So we worked around the loss of precision in ES by storing timestamps twice, in a date field (millisecond precision, good enough for search, and for humans to understand) and in a double field (microseconds since epoch, good for computations). Not exactly optimal though.

@saffroy, you are correct. I was not trying to convey otherwise, just stating that you cannot rely on timestamps for ordering logs flowing from a source. That requires a monotonically increasing sequence number. This is an old but good paper on the topic.

@portante Why not to have both tools in arsenal? I mean let's say for my case I can use nanosecond time precision and this would be sufficient to reconstruct logs. Somebody will have so many logs and they will be so dense that even nanosecond won't be enough. No problem, he will start to do this with sequence numbers. And will use both, nano seconds AND sequence numbers when required.

My impression that it's better to have 1 field rather than 2 fields that allows you to reconstruct sequence of events. Even now in ES you have "offset", which can be used in some situations, but again, this looks like a workaround.

Not to mention that in Kibana for proper visualisations it should be one field, as field grouping is not supported.

So nano seconds are needed anyway and could be used in most situations, for those who need strict order, they can add sequence numbers to logs.

The occasional clock glitch breaks perfect ordering, but it isn't an issue in practice, because it's performance data and we have many samples

Broad declarations "occasional clock glitch ... not an issue in practice" are not very helpful - while this may be your experience, but it has not been my experience.

Time is an amazing and interesting topic, but let's stay focused on the issue of date type precision in Elasticsearch, and not about measuring time or properties of time itself.

Stalled on #12829

@jordansissel The entire paragraph was only meant to give another example of a situation where we would benefit from increased precision when storing timestamps in Elasticsearch, and the remark about clock glitch ("not an issue in practice, because it's performance data") was to be understood within this specific example. Sorry if I wasn't clear about that.

Hi All,

I know this is difficult however wondered if this issue has moved on with the advent of version 6?

Thanks!

Sorry, it hasn't moved.

Any ideas when it will be? I've been recommended ELK and I hit this

No idea. The only thing I can tell is that it won't be fixed in a short term.

Am I right in that issue for dumb people like me is that when I send in "ts": "2017-08-30T14:26:30.9157480Z" ES converts that to 1504103190915 chops off the last 4 digits, parses that as a date but obviously has missing digits off the millisecond so the sorting/search is not as accurate as expected?

@jchannon what is your use case that requires that precision?

@jpountz maybe 6.x sorted indexes can be the answer to this, or are they using the same precision as the indexed values?

My use case? I have always logged to 6 decimal places and want to keep it that way. I'm astounded that this highly recommended piece of software is so poor on storing/converting dates

Our use case is that we ingest logs kubernetes => fluentd (0.14) => elasticsearch, and logs that are emitted rapidly (anything under a millisecond apart, which is easily done) obviously have no way of being kept in that order when displayed in kibana.

Same issue, we are tracking events that happen within nanosec precision.

Is there any plan to increase it?

Yes, but we need to move from Joda to Java.time in order to do so. See https://github.com/elastic/elasticsearch/issues/27330

I opened bug in Logback as its core interface also preserves data in millisecond resolution so precision is lost even earlier, before ES: https://jira.qos.ch/browse/LOGBACK-1374

It seems that historical java.util.Date type is the cause of problems is Java world.

Same use case, using kubernetes filebeat elasticsearch stack for log collection, but not having nano second precision is leading to incorrect ordering of logs.

Seems like we need to consider the collectors providing a monotonically increasing counter which records the order in which the logs were collected. Nanosecond precision does not necessarily solve the problem because time resolution might not be nanosecond.

Seriously guys ? This bug is almost 3 years old...

The problem is also that if you try to find a workaround you run into a series of other bugs so there is not even a viable acceptable workaround:

  • If you use a string, sorting will be slow
  • If you use a integer and try to make it readable you will not have big enough numbers (https://github.com/elastic/elasticsearch/issues/17006)
  • If you add an additional ordering field you cannot easily configure Kibana to have a "thenBy" ordering on that field.

So the only viable workaround seems to be to have an epoch + 2 additional digits which are increased in logstash when the timestamp matches.

Does anyone have found a better approach?

Been storing microseconds since epoch in an number field for 2 years now.
Suits our needs but YMMV.

cc @elastic/es-search-aggs

Not all time data is collected using commodity hardware. There is plenty of specialty equipment that collects nanosecond resolution data. Thinking about other applications besides log analysis. Sorting by time is critical, but aggregations over small timeframes is also important. For example, maybe I just want to aggregate some scientific data over a one second window or even over millisecond window.

I have nanosecond resolution data and would love to be able to use ES aggregations to analyze it.

Elasticsearch 7.0 will include a date_nanos field type that handles nanoseconds sorting precision:
https://github.com/elastic/elasticsearch/pull/37755
Nanoseconds precision field is now a first class citizen that doesn't require two fields to retain precision so I will close this issue, please open new ones if you find bugs or enhancements to make on this new field type.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ppf2 picture ppf2  ·  3Comments

jasontedor picture jasontedor  ·  3Comments

clintongormley picture clintongormley  ·  3Comments

clintongormley picture clintongormley  ·  3Comments

rpalsaxena picture rpalsaxena  ·  3Comments