Kibana: Add "missing" and "other" values to terms agg

Created on 18 Nov 2014  ·  79Comments  ·  Source: elastic/kibana

In kibana 3, in the pie chart definition, there are two check boxes for "missing" and "other" values.

It seems this option is gone in kibana 4.
If I do a terms aggregation on a field with 20 values and only select the top 7, the percent in the pie chart will no take the last 13 terms into consideration. In this case, I would like to be able to include a slice in the pie chart with the "other" values.

Do I miss something? Is the inclusion of "other" of "missing" values planed in kibana 4?

Visualizations elasticsearch high hanging fruit enhancement

Most helpful comment

I am wondering, if we really have to wait for ES in order to progress with this issue. There seem to be three aspects:

  • Too broad: From the perspective of my customers - and I guess from about 80% of the +1 in this thread as well - the required feature is simple. They would like to draw a pie chart with a count metric for the top-n of hosts/web server response codes/files downloaded/jobs/... By visually showing the "others", they would like to make sure that the top-n are really a relevant part of the result set. They do not care about sub-aggregations for "others". If you are interested in the top-n, the sub-aggregation for "others" is not required and we do not have to wait for ES.
  • Efficiency: Even if ES supports sub-aggregations for "others", there should be an option to switch it off. That's because the feature seems to be "expensive" in terms of database requests (two requests according to elastic/elasticsearch#12411). For the 80% described above, it is more efficient to work with _all_ the information ES returns from the first query.
  • Timing: Kibana 3 provides this feature and now we are waiting since Nov 2014 (!) for Kibana 4 to provide the same feature. From my point of view, it's time for a quick intermediate solution. That's why I implemented it. :wink:

For those 80% requiring "others" to show up in pie charts (as in Kibana 3), just get #7464. Only the other 20% have to wait for some future release of ES in order to get sub-aggregations for "others".

All 79 comments

Unfortunately this functionality was removed from Elasticsearch, Kibana has no way of calculating these values. You can follow the elasticsearch team's progress here: https://github.com/elasticsearch/elasticsearch/issues/5324

Thanks for the quick answer.

That's a major pain. Pie chart are rarely useful without being able to work on the complete data set (and drawing the complete data set is most of the time not doable nor interesting).
We just want to see / show the proportion of the total taken by each of the top 10 terms.

Yep, you may wish to weigh in on the above referenced elasticsearch ticket.

Couldn't this be done in Kibana4 right now?

missing
Add a missing aggregation at the same level as the terms aggregation. Then appending the missing bucket results at the end of the terms bucket array.
This is what we do with some custom reporting built on top of ES aggregations, and it works very well. It just requires some more logic on the client side to merge the missing bucket into the results.

other
Isn't this exactly what sum_other_doc_count of the terms aggregation is supposed to fix? It seems surprising that Kibana4 is not using this value since it's already being returned with the aggregation buckets.

As has been stated, you really need to see the whole data set for these graphs to be useful. When the data is sparse, missing and other values are really valuable metrics.

This seems about to be closed, i wonder if it can be used to close this issue https://github.com/elastic/elasticsearch/pull/11042

+1 absolutely essential

+1000

On Thu, May 21, 2015 at 5:21 AM, Jan Bernhardt [email protected]
wrote:

+1 absolutely essential


Reply to this email directly or view it on GitHub
https://github.com/elastic/kibana/issues/1961#issuecomment-104233662.

I guess people using this soft in financial sector got a heart attack now. Haha.

+1. must have

+1 absolutely essential

+1

+1

+1

This prevent me to upgrade to version 4.0!

I figured this one out:

Click Advanced in your buckets field, then in the EXCLUDE PATTERN, type: !*

This will exclude missing values!

Any updates since https://github.com/elastic/elasticsearch/issues/5324#issuecomment-104419330, e.g. along the lines proposed above in https://github.com/elastic/kibana/issues/1961#issuecomment-98795957? Terms viz that don't sum up to the total is a huge feature regression for us upgrading from kibana 3.

Must have. Preventing migration from K3 to K4.

+1

In kibana 4, when we use a data table (visualisation ), why rows with nulls dates (fields : date can be null), or missing values are systematically eliminated ?
Do you have a solution (especially for reconsidering the rows with null fields with dates in the data tables)?

Now that https://github.com/elastic/elasticsearch/pull/11042 is merged this is no longer an upstream issue. The "others" functionality desired here is actually not fixed by https://github.com/elastic/elasticsearch/pull/11042, which simply allows defining a value for documents which do not have a value.

+1

+1

+1

The problem here is that we still can't do sub-aggregations of the "other" bucket, because it isn't a real bucket

I am not sure I understand why missing and other are placed in the same issue? Especially since the original poster talked about two check boxes: 'for "missing" and "other"'. The reply explains why the functionality for 'missing' was removed, but I can't see an explanation why 'other' was also removed. Isn't 'other' low hanging fruit?
+1 to both.

+1
The more I dig into K4 the more stoppers I realize preventing me to switch from K3 (e.g. #1583 and #1547).

+1

It is very important feature!

+1

+1

+1

+1

+1

+1

I need this for representing things that are missing tags. My nasty workaround is just assigning the string "[null]" to those properties if they are missing when I load data in to elasticsearch. I'd really rather not have to do that though...

+1

+1

+1
Not heaving "others" and "missing" in place is misleading when regular users try to interpret the data shown on a dashboard with pie charts...

There still is not an "other" bucket. Missing exists, but "other" still isn't an actual bucket. Which mean you wouldn't be able to nest anything under a terms agg: https://github.com/elastic/elasticsearch/issues/12411. Still waiting on Elasticsearch for this one.

+1 on just adding the missing option

+1 Not really a work around, but I'm usually able to play with the "_missing_:X" syntax to retrieve docs that do not have field X. This could be useful for those watching this thread.
Source: ES issue 446

Until the ES solution arrives, a quick hack to show others in a pie chart:

https://github.com/grimoirelab/kibiter/commit/435b27d7e55541ca23aab6b46bd5d7557b367173

If you click on the "Others" slice, nothing happens, and the filtering with the rest of slices works, so in a first testing, it seems it is not harmful.

@acs keep in mind that it only works when you are using a count metric. It doesn't support anything else.

@spalger good point. I think this count metric scenario is what we need in several places, so until the solution arrives, we can live with that. Thanks for clarifying! :)

+1

+1

I am wondering, if we really have to wait for ES in order to progress with this issue. There seem to be three aspects:

  • Too broad: From the perspective of my customers - and I guess from about 80% of the +1 in this thread as well - the required feature is simple. They would like to draw a pie chart with a count metric for the top-n of hosts/web server response codes/files downloaded/jobs/... By visually showing the "others", they would like to make sure that the top-n are really a relevant part of the result set. They do not care about sub-aggregations for "others". If you are interested in the top-n, the sub-aggregation for "others" is not required and we do not have to wait for ES.
  • Efficiency: Even if ES supports sub-aggregations for "others", there should be an option to switch it off. That's because the feature seems to be "expensive" in terms of database requests (two requests according to elastic/elasticsearch#12411). For the 80% described above, it is more efficient to work with _all_ the information ES returns from the first query.
  • Timing: Kibana 3 provides this feature and now we are waiting since Nov 2014 (!) for Kibana 4 to provide the same feature. From my point of view, it's time for a quick intermediate solution. That's why I implemented it. :wink:

For those 80% requiring "others" to show up in pie charts (as in Kibana 3), just get #7464. Only the other 20% have to wait for some future release of ES in order to get sub-aggregations for "others".

+1

+1

As a workaround for splitting bars into _grokparsefailure and not, we add a Filters Sub Aggregation with two queries:

  • tags: _grokparsefailure
  • (_missing_: tags) OR (NOT tags: _grokparsefailure)

The first filter matches all parse failures and the second one everything that parsed OK.

@walles that's definitely something you can do when you know the categories/filters you want, and how to define "others"

+1 for 'others' support.
My situation is for vertical bar graphs that sums values over time, and has Split Bars (via Terms query) for each vertical bar that further breaks down the sums into its various components. Missing the 'other' values beyond the top N components is a hassle to be sure.

Certainly the Terms aggregation itself has the sum_other_doc_count value returned for each bucket, but there's no way that I can see to reflect this in the vertical bar graph.

+MAX_INT for this.
In building visualizations for my team the ability to see the top X results as well as the "other" or rest of them is critical, for pie or bar charts (or any visualization really). I don't care about sub-aggs in other. The conversation on #7464 seems to be a bunch of concern over precedent and consistency rather than finding a way to deliver what many would consider an expected feature.

+1

+1

+1

+1

+1

+1

+1

+1

Is there any update on this? I've been running into this issue several times now and kept silent so far as it seems essential for graphing. So I'm still assuming that this is on the radar of the devs, but would love to see an update to see where we're at.

I have a lot of vertical bar-charts which use split-bars on terms. I currently set the size to something like 9999 so I get at least something usable. But that creates many splits and I am usually only interested in the "Top \

From what I can tell googling around a bit, it seems that ES already offers this value in sum_other_doc_count so I'm a bit puzzled why this is not exposed in Kibana.

Following this thread, I can see that this only makes sense with the "Count" metric. Would it be possible to offer this in the visualisation options via a simple "Enable \

If its of any use to others, I recently had a similar problem where I wanted to show a pie chart of terms which also took the missing values into consideration.

What I did in the end was create a chart with two filters :
_exists_:"MyField"
and
_missing_:"MyField"

I then added a subbucket for Terms on the field "MyField".

This gave me the visibility I was after.

+1

Being very disappointed in kibana4/5, serveral years have past, such an essential feature is still unsupported.

+1

really need this feature.

+1

We need both "other" and "missing". +1

+1

+1

It's exactly as exhuma said (June 1) - I kept from posting the 1001st +1 here, always thinking that some kind of fix would come anyway. Would need to come, given that e.g. pie charts on a field with more than a dozen distinct values simply make no sense! (except you do what we all do: specify a size of 1000, which is a major pain for performance.)
And if the implementation is simple for count and very hard for more complex metrics, enable it for count and ease the pain of 95% of the users!
Elastic team, it would be great to get _any_ kind of feedback on this one.

+1

just a workaround for "others" slice:
use the "Filters" agg and add {"other_bucket": true} to the "Json input" field

others workaround

@ppisljar @thomasneirynck I think others can be implemented on the Terms aggregation with a modifyAggConfigOnSearchRequestStart that gathers the terms list before the real terms aggregation request is generated.

Then the terms aggregation can implement the function getRequestAggs that, when others is enabled, adds a sibling aggregation called others. The others aggregation would just be a filter aggregation that excludes the terms list gathered by modifyAggConfigOnSearchRequestStart and then asks for the requested metrics on that bucket.

Filtering would not affect the flow because any applied filters will get accounted for in the pre-flight request to gather the terms list. The pre-flight request is executed each time before the aggregation is created.

histogram aggregation provides a working example of the pre-flight request. It fetches the min and max so that when the actual histogram aggregation is requested, an appropriate interval can be used to avoid requesting too many buckets.

@ppisljar and I chatted about the above solution.

It does not work when dealing with nested aggregations. For example, a date_histogram containing a terms aggregation (user wants to see the top terms per day). A separate sibling filter aggregation will be required for each date_histogram bucket. How would modifyAggConfigOnSearchRequestStart know which bucket(s) its in?

We decided that aggregations need a post-flight concept. That way, the sibling aggregation(s) can be created for each parent bucket, fetched, and then merged into the results

@jetnet Do you know if something similar is possible with "terms" instead of "filters"? I have a chart where the terms are unknown on a given time-slice. In this particular example they contain IP addresses of network routers causing error on a network. What I want to know is the "Top 10" IPs in that time-slice, but would also need to see the "others" slice. Mainly, this would help me to see if (and how many) other IPs are causing issues. If I set the number of slices to 10, and see 10 slices, I could be looking at 10 failing IPs, or 12, or 5000. The only way to make this "visible" is by adding an "other" slice.

As I don't know which devices cause error at any given time, I can't "hard-code" those values in the visualisation filters.

I am currently actively working on adding support for missing and other
bucket to the terms agg, hopefully it's going to be ready soon

On 8 Dec 2017 9:30 am, "Michel Albert" notifications@github.com wrote:

@jetnet https://github.com/jetnet Do you know if something similar is
possible with "terms" instead of "filters"? I have a chart where the terms
are unknown on a given time-slice. In this particular example they contain
IP addresses of network routers causing error on a network. What I want to
know is the "Top 10" IPs in that time-slice, but would also need to see the
"others" slice.

As I don't know which devices cause error at any given time, I can't
"hard-code" those values in the visualisation filters.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/kibana/issues/1961#issuecomment-350204124,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AM_5cbFkRIzP5Q-ZUWw2X8DwmvToVA5pks5s-POpgaJpZM4C9Muj
.

For those following along, support for "other" and "missing" buckets has just been released in 6.2.0: https://www.elastic.co/blog/kibana-6-2-0-released

Was this page helpful?
0 / 5 - 0 ratings