Kibana: Add "missing" and "other" values to terms agg

Created on 18 Nov 2014 · 79Comments · Source: elastic/kibana

In kibana 3, in the pie chart definition, there are two check boxes for "missing" and "other" values.

It seems this option is gone in kibana 4.
If I do a terms aggregation on a field with 20 values and only select the top 7, the percent in the pie chart will no take the last 13 terms into consideration. In this case, I would like to be able to include a slice in the pie chart with the "other" values.

Do I miss something? Is the inclusion of "other" of "missing" values planed in kibana 4?

Visualizations elasticsearch high hanging fruit enhancement

Source

bquartier

👍31 ❤3

Most helpful comment

I am wondering, if we really have to wait for ES in order to progress with this issue. There seem to be three aspects:

Too broad: From the perspective of my customers - and I guess from about 80% of the +1 in this thread as well - the required feature is simple. They would like to draw a pie chart with a count metric for the top-n of hosts/web server response codes/files downloaded/jobs/... By visually showing the "others", they would like to make sure that the top-n are really a relevant part of the result set. They do not care about sub-aggregations for "others". If you are interested in the top-n, the sub-aggregation for "others" is not required and we do not have to wait for ES.
Efficiency: Even if ES supports sub-aggregations for "others", there should be an option to switch it off. That's because the feature seems to be "expensive" in terms of database requests (two requests according to elastic/elasticsearch#12411). For the 80% described above, it is more efficient to work with _all_ the information ES returns from the first query.
Timing: Kibana 3 provides this feature and now we are waiting since Nov 2014 (!) for Kibana 4 to provide the same feature. From my point of view, it's time for a quick intermediate solution. That's why I implemented it. :wink:

For those 80% requiring "others" to show up in pie charts (as in Kibana 3), just get #7464. Only the other 20% have to wait for some future release of ES in order to get sub-aggregations for "others".

FlorianLiers on 16 Jun 2016

👍6

All 79 comments

Unfortunately this functionality was removed from Elasticsearch, Kibana has no way of calculating these values. You can follow the elasticsearch team's progress here: https://github.com/elasticsearch/elasticsearch/issues/5324

rashidkpc on 18 Nov 2014

Thanks for the quick answer.

That's a major pain. Pie chart are rarely useful without being able to work on the complete data set (and drawing the complete data set is most of the time not doable nor interesting).
We just want to see / show the proportion of the total taken by each of the top 10 terms.

bquartier on 18 Nov 2014

Yep, you may wish to weigh in on the above referenced elasticsearch ticket.

rashidkpc on 19 Nov 2014

Couldn't this be done in Kibana4 right now?

missing
Add a missing aggregation at the same level as the terms aggregation. Then appending the missing bucket results at the end of the terms bucket array.
This is what we do with some custom reporting built on top of ES aggregations, and it works very well. It just requires some more logic on the client side to merge the missing bucket into the results.

other
Isn't this exactly what sum_other_doc_count of the terms aggregation is supposed to fix? It seems surprising that Kibana4 is not using this value since it's already being returned with the aggregation buckets.

As has been stated, you really need to see the whole data set for these graphs to be useful. When the data is sparse, missing and other values are really valuable metrics.

bradvido on 4 May 2015

👍2

This seems about to be closed, i wonder if it can be used to close this issue https://github.com/elastic/elasticsearch/pull/11042

jccq on 12 May 2015

+1 absolutely essential

janbernhardt on 21 May 2015

+1000

On Thu, May 21, 2015 at 5:21 AM, Jan Bernhardt [email protected]
wrote:

+1 absolutely essential

—
Reply to this email directly or view it on GitHub
https://github.com/elastic/kibana/issues/1961#issuecomment-104233662.

ajrasch on 21 May 2015

I guess people using this soft in financial sector got a heart attack now. Haha.

celesteking on 21 May 2015

😄1

+1. must have

qcho on 26 May 2015

+1 absolutely essential

dev-shubh on 29 May 2015

joelsvensson on 29 May 2015

manuel-sousa on 29 May 2015

JulienPalard on 3 Jun 2015

This prevent me to upgrade to version 4.0!

tojocky on 8 Jun 2015

I figured this one out:

Click Advanced in your buckets field, then in the EXCLUDE PATTERN, type: !*

This will exclude missing values!

nmors on 17 Jun 2015

Any updates since https://github.com/elastic/elasticsearch/issues/5324#issuecomment-104419330, e.g. along the lines proposed above in https://github.com/elastic/kibana/issues/1961#issuecomment-98795957? Terms viz that don't sum up to the total is a huge feature regression for us upgrading from kibana 3.

jdanbrown on 4 Jul 2015

Must have. Preventing migration from K3 to K4.

snarahari on 5 Aug 2015

bertol83 on 26 Aug 2015

In kibana 4, when we use a data table (visualisation ), why rows with nulls dates (fields : date can be null), or missing values are systematically eliminated ?
Do you have a solution (especially for reconsidering the rows with null fields with dates in the data tables)?

acheriat on 27 Aug 2015

~~Now that https://github.com/elastic/elasticsearch/pull/11042 is merged this is no longer an upstream issue.~~ The "others" functionality desired here is actually not fixed by https://github.com/elastic/elasticsearch/pull/11042, which simply allows defining a value for documents which do not have a value.

spalger on 9 Sep 2015

zp-markusp on 12 Sep 2015

PasghettiCode on 16 Sep 2015

deviantony on 17 Sep 2015

The problem here is that we still can't do sub-aggregations of the "other" bucket, because it isn't a real bucket

rashidkpc on 18 Sep 2015

Here is the relevant ES ticket: https://github.com/elastic/elasticsearch/issues/12411

tbragin on 18 Sep 2015

I am not sure I understand why missing and other are placed in the same issue? Especially since the original poster talked about two check boxes: 'for "missing" and "other"'. The reply explains why the functionality for 'missing' was removed, but I can't see an explanation why 'other' was also removed. Isn't 'other' low hanging fruit?
+1 to both.

ravigad on 24 Sep 2015

+1
The more I dig into K4 the more stoppers I realize preventing me to switch from K3 (e.g. #1583 and #1547).

warpkanal on 3 Oct 2015

VertexZZZ on 12 Nov 2015

It is very important feature!

vklindukh on 2 Dec 2015

nedmax on 3 Dec 2015

sumo-89 on 3 Dec 2015

knobli on 7 Dec 2015

acs on 11 Dec 2015

wellingtonanastacio on 17 Dec 2015

I need this for representing things that are missing tags. My nasty workaround is just assigning the string "[null]" to those properties if they are missing when I load data in to elasticsearch. I'd really rather not have to do that though...

jdavisclark on 22 Dec 2015

😕2

PaulGrandperrin on 6 Jan 2016

SvenVD on 7 Jan 2016

+1
Not heaving "others" and "missing" in place is misleading when regular users try to interpret the data shown on a dashboard with pie charts...

fholzer on 14 Jan 2016

👍1

There still is not an "other" bucket. Missing exists, but "other" still isn't an actual bucket. Which mean you wouldn't be able to nest anything under a terms agg: https://github.com/elastic/elasticsearch/issues/12411. Still waiting on Elasticsearch for this one.

rashidkpc on 15 Jan 2016

+1 on just adding the missing option

andrewvc on 18 Jan 2016

+1 Not really a work around, but I'm usually able to play with the "_missing_:X" syntax to retrieve docs that do not have field X. This could be useful for those watching this thread.
Source: ES issue 446

aallegret on 4 Feb 2016

Until the ES solution arrives, a quick hack to show others in a pie chart:

https://github.com/grimoirelab/kibiter/commit/435b27d7e55541ca23aab6b46bd5d7557b367173

If you click on the "Others" slice, nothing happens, and the filtering with the rest of slices works, so in a first testing, it seems it is not harmful.

acs on 15 Feb 2016

@acs keep in mind that it only works when you are using a count metric. It doesn't support anything else.

spalger on 15 Feb 2016

@spalger good point. I think this count metric scenario is what we need in several places, so until the solution arrives, we can live with that. Thanks for clarifying! :)

acs on 15 Feb 2016

EdwardKaravakis on 25 May 2016

lsc36 on 14 Jun 2016

I am wondering, if we really have to wait for ES in order to progress with this issue. There seem to be three aspects:

Too broad: From the perspective of my customers - and I guess from about 80% of the +1 in this thread as well - the required feature is simple. They would like to draw a pie chart with a count metric for the top-n of hosts/web server response codes/files downloaded/jobs/... By visually showing the "others", they would like to make sure that the top-n are really a relevant part of the result set. They do not care about sub-aggregations for "others". If you are interested in the top-n, the sub-aggregation for "others" is not required and we do not have to wait for ES.
Efficiency: Even if ES supports sub-aggregations for "others", there should be an option to switch it off. That's because the feature seems to be "expensive" in terms of database requests (two requests according to elastic/elasticsearch#12411). For the 80% described above, it is more efficient to work with _all_ the information ES returns from the first query.
Timing: Kibana 3 provides this feature and now we are waiting since Nov 2014 (!) for Kibana 4 to provide the same feature. From my point of view, it's time for a quick intermediate solution. That's why I implemented it. :wink:

FlorianLiers on 16 Jun 2016

👍6

netzling on 30 Aug 2016

audriusbugas on 7 Sep 2016

As a workaround for splitting bars into _grokparsefailure and not, we add a Filters Sub Aggregation with two queries:

tags: _grokparsefailure
(_missing_: tags) OR (NOT tags: _grokparsefailure)

The first filter matches all parse failures and the second one everything that parsed OK.

walles on 2 Dec 2016

👍1

@walles that's definitely something you can do when you know the categories/filters you want, and how to define "others"

spalger on 2 Dec 2016

+1 for 'others' support.
My situation is for vertical bar graphs that sums values over time, and has Split Bars (via Terms query) for each vertical bar that further breaks down the sums into its various components. Missing the 'other' values beyond the top N components is a hassle to be sure.

Certainly the Terms aggregation itself has the sum_other_doc_count value returned for each bucket, but there's no way that I can see to reflect this in the vertical bar graph.

SaanichGAS on 11 Jan 2017

+MAX_INT for this.
In building visualizations for my team the ability to see the top X results as well as the "other" or rest of them is critical, for pie or bar charts (or any visualization really). I don't care about sub-aggs in other. The conversation on #7464 seems to be a bunch of concern over precedent and consistency rather than finding a way to deliver what many would consider an expected feature.

JeffBolle on 27 Jan 2017

alxbog on 17 Feb 2017

jbwl on 21 Feb 2017

z-matth on 23 Feb 2017

bhatiaabhinav on 2 Mar 2017

😕3

boomin614 on 5 Mar 2017

😕2

leosulake on 23 Mar 2017

😕2

kiblik on 28 Mar 2017

😕2

pjcard on 28 Apr 2017

Is there any update on this? I've been running into this issue several times now and kept silent so far as it seems essential for graphing. So I'm still assuming that this is on the radar of the devs, but would love to see an update to see where we're at.

I have a lot of vertical bar-charts which use split-bars on terms. I currently set the size to something like 9999 so I get at least something usable. But that creates many splits and I am usually only interested in the "Top \

From what I can tell googling around a bit, it seems that ES already offers this value in sum_other_doc_count so I'm a bit puzzled why this is not exposed in Kibana.

Following this thread, I can see that this only makes sense with the "Count" metric. Would it be possible to offer this in the visualisation options via a simple "Enable \

exhuma on 1 Jun 2017

If its of any use to others, I recently had a similar problem where I wanted to show a pie chart of terms which also took the missing values into consideration.

What I did in the end was create a chart with two filters :
_exists_:"MyField"
and
_missing_:"MyField"

I then added a subbucket for Terms on the field "MyField".

This gave me the visibility I was after.

Funkd on 7 Jun 2017

zhangskd on 3 Jul 2017

Being very disappointed in kibana4/5, serveral years have past, such an essential feature is still unsupported.

zhangskd on 3 Jul 2017

👍5

daiglej-LSPD on 13 Jul 2017

👎1

really need this feature.

5ean on 31 Aug 2017

👎1

leifker on 1 Sep 2017

👎1

We need both "other" and "missing". +1

nakedible-p on 1 Sep 2017

👎1

davidban77 on 9 Sep 2017

👎1

sedelnik on 1 Nov 2017

It's exactly as exhuma said (June 1) - I kept from posting the 1001st +1 here, always thinking that some kind of fix would come anyway. Would need to come, given that e.g. pie charts on a field with more than a dozen distinct values simply make no sense! (except you do what we all do: specify a size of 1000, which is a major pain for performance.)
And if the implementation is simple for count and very hard for more complex metrics, enable it for count and ease the pain of 95% of the users!
Elastic team, it would be great to get _any_ kind of feedback on this one.

ulir on 13 Nov 2017

LuigiClemente-Awin on 23 Nov 2017

just a workaround for "others" slice:
use the "Filters" agg and add {"other_bucket": true} to the "Json input" field

others workaround

jetnet on 27 Nov 2017

👍5

@ppisljar @thomasneirynck I think others can be implemented on the Terms aggregation with a modifyAggConfigOnSearchRequestStart that gathers the terms list before the real terms aggregation request is generated.

Then the terms aggregation can implement the function getRequestAggs that, when others is enabled, adds a sibling aggregation called others. The others aggregation would just be a filter aggregation that excludes the terms list gathered by modifyAggConfigOnSearchRequestStart and then asks for the requested metrics on that bucket.

Filtering would not affect the flow because any applied filters will get accounted for in the pre-flight request to gather the terms list. The pre-flight request is executed each time before the aggregation is created.

histogram aggregation provides a working example of the pre-flight request. It fetches the min and max so that when the actual histogram aggregation is requested, an appropriate interval can be used to avoid requesting too many buckets.

nreese on 6 Dec 2017

❤1

@ppisljar and I chatted about the above solution.

It does not work when dealing with nested aggregations. For example, a date_histogram containing a terms aggregation (user wants to see the top terms per day). A separate sibling filter aggregation will be required for each date_histogram bucket. How would modifyAggConfigOnSearchRequestStart know which bucket(s) its in?

We decided that aggregations need a post-flight concept. That way, the sibling aggregation(s) can be created for each parent bucket, fetched, and then merged into the results

nreese on 7 Dec 2017

@jetnet Do you know if something similar is possible with "terms" instead of "filters"? I have a chart where the terms are unknown on a given time-slice. In this particular example they contain IP addresses of network routers causing error on a network. What I want to know is the "Top 10" IPs in that time-slice, but would also need to see the "others" slice. Mainly, this would help me to see if (and how many) other IPs are causing issues. If I set the number of slices to 10, and see 10 slices, I could be looking at 10 failing IPs, or 12, or 5000. The only way to make this "visible" is by adding an "other" slice.

As I don't know which devices cause error at any given time, I can't "hard-code" those values in the visualisation filters.

exhuma on 8 Dec 2017

I am currently actively working on adding support for missing and other
bucket to the terms agg, hopefully it's going to be ready soon

On 8 Dec 2017 9:30 am, "Michel Albert" notifications@github.com wrote:

@jetnet https://github.com/jetnet Do you know if something similar is
possible with "terms" instead of "filters"? I have a chart where the terms
are unknown on a given time-slice. In this particular example they contain
IP addresses of network routers causing error on a network. What I want to
know is the "Top 10" IPs in that time-slice, but would also need to see the
"others" slice.

As I don't know which devices cause error at any given time, I can't
"hard-code" those values in the visualisation filters.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/kibana/issues/1961#issuecomment-350204124,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AM_5cbFkRIzP5Q-ZUWw2X8DwmvToVA5pks5s-POpgaJpZM4C9Muj
.

ppisljar on 9 Dec 2017

👍3 🎉1

For those following along, support for "other" and "missing" buckets has just been released in 6.2.0: https://www.elastic.co/blog/kibana-6-2-0-released