grafana 🚀 - [Feature request] Multiple alerts per graph

Concrete use case: I have instrumented my app to record a histogram in Prometheus for each major function (e.g. where an external HTTP call or disk I/O takes place) and would like to alert when any of these becomes slow.

Presently I have to define dummy graphs for this because of the 1:1 relationship between graph and alert. It would be much more logical to keep the alerts defined in the same place as the graph itself.

gdhgdhgdh on 13 Apr 2017

👍2

And you cannot defined that in one query?

torkelo on 13 Apr 2017

No; a chain of OR conditions is crude, and the single name of the Alert can not clearly identify the exact reason for the alert. I definitely don't want to send alerts along the lines of Some part of service X is failing - engineers on call would not be my friends...

gdhgdhgdh on 13 Apr 2017

👍44 🚀1 ❤1

then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc.

torkelo on 13 Apr 2017

👎55 😕5

Yep that's exactly what I'm doing at the moment. Is there any likelihood of implementing multiple alerts per graph in the near future so I can move away from this workaround?

gdhgdhgdh on 13 Apr 2017

it's very unlikely

torkelo on 13 Apr 2017

👎124

maybe if there huge demand for it :)

torkelo on 13 Apr 2017

👍315 🎉12 🚀11 ❤5 👀3 😄2

haha OK - I'll see if I can rustle up an angry mob ;) Seriously tho', thanks for the honesty.

gdhgdhgdh on 13 Apr 2017

👍24

Ok we have a mob of two :-) I'm graphing fuel levels in multiple tanks & wanted to set up a low fuel alert for each tank.

rssalerno on 18 May 2017

and each tank has different thresholds or notifications ?

torkelo on 18 May 2017

Exactly. One is a 285 gal heating oil tank. I wanted to set up an "heating oil low" alert when that tank goes below 70gals. The other is a 500 gal propane tank, for that I wanted a "propane low" alert when it goes under 100 gal. I set up singlestats for each but alerts are not available in a singlestat.

fuellevels

rssalerno on 18 May 2017

I have a graph with a median and a 90th percentile metric. I'd like to get an alert on each. In order to do this, I have to create one graph for each. Then, if I want warnings and critical alerts for each, I have to create a second graph for each.

I have 30 or 40 services to monitor, each with 2 to 5 key metrics. I have graphs where I graph the same metric for multiple customers, and while I don't have to do alerts per customer (yet), it does add to the number of metrics I'd like to have alerts on. The amount of work to create dozens of graphs expands very quickly. It would be very useful in my current production environment (and in my previous production environments) to have warnings and critical alerts, and to display multiple metrics in a single graph and alert on them.

oododa on 29 May 2017

I'd also like to see this feature. A good example is one alert if a metric goes outside of a threshold and another alert if data fails to update. Ii.e., if a value goes too high or if values fail to report. This could be used to show that whatever is reporting the data has encountered an issue that is preventing communication with grafana (or whatever backend).

alex-phillips on 7 Jun 2017

Hi Torkelo!

I got several "likes" for the feature! Will we enter the next realease =) ?

rmsys on 9 Jun 2017

👍4

@rmsys maybe at some point, solving it from UX perspective & code complexity (and UX complexity) perspective will take time, it's not on any roadmap yet, but maybe next year as the alerting engine mature more and a UX design for this is worked out

torkelo on 9 Jun 2017

Another good use case for multiple alerts is to have different severity thresholds with different actions. If a server starts to exhibit slowdowns, an email might be sufficient, but if the slowdowns become extreme, it might be worth paging the administrator.

jpriebe on 9 Jun 2017

👍32

I have a graph that returns a metric with the value of valid and invalid. This would be useful to me because I could use a single graph containing two queries to create alerts that fire when valid's are too low and invalid's are too high.

pgporada on 16 Jun 2017

Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

Not sure I understand what you mean by this. Can you elaborate?

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

Grafana is a visual tool and we have chosen to tie an alert rule to a graph so that the alert rule state can visualized easily (via the metrics, thresholds & alert state history). I am afraid that having each graph be able to represent multiple alert rules will complicate this to a very large extent and I am not sure about the need for this.

@rssalerno having support for alert rules in singlestat panel seems unrelated to this issue.

@alex-phillips you scenario sounds like it can be solved by making individual alert rules more flexible.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

torkelo on 28 Aug 2017

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

I think multiple alert rules would be annotated individually. Hearts might be colour-coded. Rules would need to be named for differentiation in alerts/panels.

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

Generally I would think not, though I suspect groups would need to have a shared threshold, and name if they were implemented (per https://github.com/grafana/grafana/issues/6557#issuecomment-324363795).

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

If rules take an additional colour param, thresholds can be rendered using that, and diffentiated as such, probably want a tooltip also. Being able to toggle rules would be useful, and a param to render a specific rule takes care of the latter I think?

@rssalerno having support for alert rules in singlestat panel seems unrelated to this issue.

I believe you'll find he was referring to the graph below that, though since he has separate panels for each tank, singlestat alerting may solve his problem for that specific dashboard.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

Primarily, I'd like this to support #6557 and #6553, and for multiple thresholds, similar to @alex-phillips. For example, one use-case we have for #6557 is to alert differently for different environments (production, beta, dev, etc), combined with multiple thresholds that would solve most of our problems. If there's a better way of doing that without multiple rules, it's not obvious to me.

pdf on 28 Aug 2017

👍5

@torkelo

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

I like the approach suggested by @pdf

Further, the approach to show annotations would be the same as the current case, where you have an alert rule with > 1 conditions (each having a different threshold). And the green/red heart beside the panel title would be shown as red (if there is atleast one alert which is firing), similar to current scenario where at-least one condition of an alert rule evaluating to true). And probably also show the number (2/5) along with the red heart in the title.

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

In most of our use cases, these rules would not share anything between them and the queries are also different

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

They would show up as separate rules in alerts page. The Alert tab, would probably have a list of alerts defined. Right, we would need to highlight/expand the specific alert rule on this tab, when the alret rule url (should capture the alert id or index) is accessed from the notification. Seems to be easily solvable.

In the alert list panel, there wouldn't be any change. It shows all of them separately. Semantically, each alert is separate. Just that it has been placed in the same panel.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

Considering that a lot of people have upvoted for this feature, it would definitely be a useful feature. If we have the support for multiple alerts, then I think it would be upto each user's perception whether it's confusing or not. IMHO, those who think it is confusing would go with the current approach of separate panels for each graph and for those who think the utility/convenience of having the same panel used for visualization and alerting outweighs the perceived confusion, will go the multiple alerts way. Sure it would change the UX somewhat

ddhirajkumar on 31 Aug 2017

👍1

In splunk we have high/low alerts. If multiple alerts in grafana available, we'd just use the same search, they are just different thresholds against the same search.

rossKayHe on 5 Sep 2017

+1 for this feature.

fadlytabrani on 9 Nov 2017

+1 for this. Our use case is as follows: We want to define one chart with, say, cpu usage for all of our servers. Then on that same chart we will make two hidden metrics, one for cpu usage on production servers and one for cpu usage on non-production servers. Each of those metrics would have its own alert, with different notification channels. We do not want to have to create multiple charts or panels or dashboards to accomplish this.

sparr on 15 Nov 2017

👍7

+1 for this feature.

nelg on 30 Nov 2017

Came here reading some of the other issues regarding categories and severities. I agree all alerts should be actionable. But there is a difference between a "fix this first thing in the morning" alert and a "call out the $400/hour consultant ASAP" alert.

As many have mentioned, this is most common solved by Warning and Critical thresholds.

Technically this could be implemented in a bunch of ways, labels, several alerts per panel, several thresholds per alert etc.

Regarding confusion if the categorization is to complex, a Warning/Critical setup can simply use Red/Yellow. Red overrides Yellow.

For more complex setups, another option besides hover to locate the offending time-series could be a flashing line/area/whatever? That could draw attention to the correct time-series easily.

I think most users would be satisfied by a fairly simple Warn/Crit separation though.

StianOvrevage on 1 Dec 2017

👍13

This is an absolute must for an alerting software, especially for server monitoring. Disk space, memory, cpu usage, temperature, load avg.... all prime examples where one would want multiple alerts configured with different messages with different thresholds. Take disk space for example. Need one alert for disk usage over 70%, another for disk usage over 90%.

KieranP on 10 Dec 2017

👍20

Bit of an edge case, but we are using the alerts to notify us if a product hasn't sold in a few days. We have each product as a metric, which in turn means we only get one alert when one of the metrics enters the alerting threshold. Ideally we would like to receive an alert if the alert shows any additional metric has entered the alerting threshold as well.

Also we are using templating vars to repeat a graph for each selected product with two metrics overlayed (volume and gross margin) on the left and right y axis. This kills any chance of using alerting as the alert query isn't picking up the $sku list variable for our IN ($sku).

To work around this I've tried having another query B which just runs the template query to look up all skus we are interested in and puts that straight into the alert query IN (SELECT skus from interested_product_table). However this starts sending us alerts for each graph for all the metrics across each graph meaning we get:

Email Alert 1 - metric1,metric2,metric3
Email Alert 2 - metric1,metric2,metric3
Email Alert 3 - metric1,metric2,metric3
Email Alert 4 - metric1,metric2,metric3

Email Alert 5 - metric4
Email Alert 6 - metric4
Email Alert 7 - metric4
Email Alert 8 - metric4

For example which is quite spammy.

Caffe1neAdd1ct on 4 Jan 2018

Full agree that the feature is the must and totally disagree that ALL notifications should be actionable.

The simplest example is that you might have alerts which you get and you need to make some action as soon as possible like next morning while there are other types of alerts which should get you up even in the middle of the night to fix production servers.

BushnevYuri on 16 Feb 2018

Throwing in my two cents - I would love to have this feature.

I don't even need different hearts or different colored hearts (red for any alert on the graph is fine), it's the email notifications I want different names for.

ewalkup on 22 Feb 2018

Please add this feature. for a use case like this,
from a single graph
if value > X --> slack
if Value > X+Y --> PD

jeevarathinam-dhanapal on 14 Mar 2018

👍14

We have a policy here of actionable alerts, where the alert should specify the action to take if possible. We have different actions to take based on metrics being too low, or too high.

For example: RDS CPU too low? check the other stack here for behavior. Too high? Scale up the instance.

As with others, we also like to have different types of alerting at different thresholds.

jdblack on 21 Mar 2018

Similar to @jdblack I want to have a high water warning level and a high water emergency level. I know I can do it with two queries but it’s not as intuitive or slick.

Clete2 on 21 Mar 2018

I was thinking about using Grafana as a way to signal an autoscaling system. If the metric is too low, send a webhook with a message to scale down, if it is too high, send a webhook with a message to scale out. Without multiple alerts, I do not believe this is not possible. I also agree with others in the thread that the use case for a "warning" then a "critical" threshold is common.

Perhaps the idea of coupling the alerts to a graph should be revisited? Maybe alerts should be created separately, with a nice preview graph when creating the alert. This decoupling might make it more work when changing a graph metric, but at least it would have more flexibility around making multiple alerts.

mentalblock on 21 Mar 2018

I've been trying to use Grafana + Influx for sensor networks. The dashboards work pretty well, except for alerts. I need to be alerted when Sensor123 exceeds a certain threshold. I don't need a chart for that, just an alert. Also, I need to potentially have 1000s of sensors. I can setup an alert if "any" sensor exceeds the threshold but I need to know which one(s) is/are alerting. I have dashboards setup with template variables to view a specific sensor, but I can't add an alert for a template variable. For testing, I just setup a handful of alerts for a handful of sensors in an extra dashboard that no one looks at, but moving forward I need a different solution for alerts.

MakoSDV on 29 Mar 2018

👍1

@torkelo , Approaching a year since any official comment on this - just wondering if there are any updates that can be shared now that the alerting system has been in the wild for some time?

calebtote on 17 Jul 2018

👍39

@MakoSDV you should consider using kapacitor for that use case.

mayuresh82 on 1 Aug 2018

+1 for this feature; it would be really useful also for two-level alerting (e.g: something > X = yellow alert, something > Y = red alert)

carminexx on 4 Dec 2018

👍3

+1 for making the alerting more flexible

marioschaefer on 16 Jan 2019

i monitor temperature graphs in a heating boiler, the low temp threshold is a trivial one and needs to go to a non-critical notification channel, but the high temp is urgent and needs to buzz via the urgent channel. Multiple alert rules would make a lot of sense here.

a14a on 17 Jan 2019

👍1

It's a shame that this issue appears abandoned. Does anyone know how we can get developer attention to it ?

It seems like UI-wise, it would be comparatively easy to implement alerts the way overrides are implemented, to allow one or more alerts without much UI changes.

Gaibhne on 22 Jan 2019

👍2

@Gaibhne wrote:

Does anyone know how we can get developer attention to it?

Pay for support perhaps? Seems like there haven't been any resources available for any of the serious alerting-related deficiencies, though they've remained the highest Github user-rated issues for years.

pdf on 22 Jan 2019

+1 for this request.

We have a counter set up in our app for when a request to an external service we integrate with times out that we've created a graph for in Grafana.

If there's a couple of time outs we'd like to know so we can chase the external service up about it later, if there are a lot of timeouts it means our app is likely to have been affected for most customers so we need to respond and deal with it immediately.

annedroiid on 28 Jan 2019

+1 for this as well.

Currently trying to set up two separate alerts for a graph:

Slack message for data reaching a _warning_ level
Pager Duty alert for data reaching a _critical_ level

Currently to my understanding, I would have to create two separate graphs of the same data in order to accomplish this. It would make more sense to me to have multiple different alerts acting on the same graph.

@torkelo is there any update on plans for this going into 2019?

jsterling7 on 4 Feb 2019

👍15

+1

We have dashboards that monitor the same microservices for multiple clients/environments using a variable to switch between the displayed environment.

Our current pain could be reduced if we were able to use variables in the alert title/text so that we can identify the client/environment, but longer term we would really like the ability to create separate alerts with differing thresholds using the same graph.

It would be great even if it required using a different query for each alert, and just setting query to not visible on graph.

itonlytakeswon on 25 Feb 2019

What you are describing @itonlytakeswon also seems to relate to https://github.com/grafana/grafana/issues/6557 , so you might want to track that one as well :)

StianOvrevage on 26 Feb 2019

How is this not a feature already ?

johntdyer on 1 Mar 2019

👍4 👎3

@jsterling7 describes our desired use case perfectly.

adamehirsch on 20 Mar 2019

👍2

@torkelo Any feature release

onieio on 27 Mar 2019

Either multiple alerts or allowing tag values in the alert title/body somewhere would solve this for our usage. We have a single graph showing a tagged metric with several independent sources and want to know which one drops below the threshold. I'm right now making the 10 separate graphs I'll need to accomplish this but it feels like a missing feature and poor for longterm maintenance on my end.

dprime on 1 Apr 2019

👍3

It seems that it's high demand, I'm one of them who needs this type of feature. I almost love grafana then suddenly this limitation turns me off.

marksagal on 23 Apr 2019

👍11

My use case is similar to others referenced here and to #6557 issue. We have multiple elasticsearch clusters monitored in a single template dashboard. I would like to trigger alerts to them individually and as it is now I cannot just create a graph with the queries hardcoded but have to create a graph for each cluster, in order to have this alerts working...

gabrielmcf on 13 May 2019

👍3

+1, this would great help our environment! Even just a yellow/red 'heart' two-alert per graph setup, where if red is triggered, it overrides yellow.

MarvelPhx on 25 Jul 2019

👍2

+1 this would be great, wondering how trivial it would be to just allow each condition to have an optional configurable alert notification and if not for a specific condition it can fall back on a default notification message... quickest way to make it happen i think ?

k3vl4rAtAirside on 30 Jul 2019

+1 it would be really useful to us as well. We have lots of dashboards with templating on multiple variables, it would be great to have template substitution done on both the alert name and alert notification.

godofwharf on 6 Aug 2019

👍2

+1, IMO this should be present in every monitoring system...there are many situations when you need to identify the severity of the alert and react accordingly, which means multiple alerts with different thresholds in the same dashboard.

fgg89 on 12 Aug 2019

👍8 👎1

+1 from me too - surprised this doesn't exist already!

kaylafuchs on 12 Aug 2019

👍8 👎3

+1

zhunussovr on 21 Aug 2019

👎4 👍4

I think this feature goes hand-in-hand with the limitation of template query support.

I've set up a few prometheus fed graphs with queries that have templates on instance, and type labels. I get around the template problem by creating invisible queries for the template values.

I would like separate alerts for each template value, but I'm limited to a single alert with a generic one-size-fits-all action+message. I can use a long OR list to alert on all my queries but this is feels crude.

An alternative is to make a separate dashboard with tons of panels that nobody looks at, just to serve as an alerting source.

Adding support for multiple alerts seems like it could potentially be first step at supporting template query alerts.

spanishgum on 23 Aug 2019

👍1

+1. This is a must have!

MuriloKakazu on 31 Aug 2019

👍5 👎2

+1 This is extremely useful

dkorn on 9 Sep 2019

👍5 👎2

@torkelo "then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc."

This doesn't make any sense. Requiring users to visualize the same panel multiple times just so they can send useful non-generic alert messages isn't a solution. It's a hack for something that should be a feature, and it adds noise that degrades the usefulness of the product.

spasarok on 7 Oct 2019

👍13

@torkelo "then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc."

This doesn't make any sense. Requiring users to visualize the same panel multiple times just so they can send useful non-generic alert messages isn't a solution. It's a hack for something that should be a feature, and it adds noise that degrades the usefulness of the product.

Exactly. +1 for multiple alerts per panel

griffincox on 22 Oct 2019

👍1

In our situation, we are measuring cell voltages in batteries (16 cells per battery). We graph the 16 series on a single panel for comparison and have a different panel for each battery.

A single alert for the panel (graph) isn't too helpful. We really need the ability to set up at least one alert per cell so that the alert e-mail indicates which cell(s) is/are out of range in terms of voltage.

Since, in our case, the acceptable voltage range is the same for each cell, it would be great to be able to define an upper and and lower limit and relate individual cell ranges to those defined limits.

At the moment we have to program 16 x OR statements for the cell series, and (re)-define the limits for each cell in the process - painful to set up and a maintenance nightmare to modify.

Ideally we should also be programming warning and critical events for each of cells on the graph panel.

I think its high time that the alert structure was modified to encompass the requirements that users have identified. These requirements are commonly implemented in SCADA systems which also generate alerts. Its really just a logic engine, surely?

eapperley on 6 Nov 2019

Any update on this? I feel like this feature is a must have for larger deployments. Especially since we'd like to have a single graph for example showing storage usage, we want an alert for 70%, 80% etc etc which shouldn't huge amounts of graphs.

RyanW19 on 18 Nov 2019

👍3

I've just now stumbled upon this and I am very surprised there's no way to do this yet D:

pbuhrmann on 27 Nov 2019

👍5

I see here https://github.com/grafana/grafana/pull/20822#issuecomment-561047900 that this will not be implemented in the future and sounds like alerts will get pulled out of dashboards entirely.

How will this affect the dashboard json model? Can anyone speak to when there will be more news around this?

jpmcb on 24 Dec 2019

👍4

This was a much needed feature. Any update on the upcoming situation yet?

beingyash on 20 Jan 2020

👍11

+1 for multiple alerts per panel

anthosz on 10 Apr 2020

👍6

+1 for this feature.

gitesh-purbia on 18 May 2020

This was a much needed feature. Any update on the upcoming situation yet?

Need this feature.

Dindrawat on 18 May 2020

3 years after.. Someone can tell us why this is not implemented (despite the number of requests)?
It's due to a technical constraint to implement it? It's rejected? It's in to-do?
Like say previously, seems a 'basic feature'.
Example: I have a dashboard and a serie with 200 servers, if I add an alert:
One of 200 servers is dead: cool I receive the alert with name
Oups, a new server is dead: no alert (or need to refresh dashboard or wait the reminder 24h after..)
This is not possible to add like a checkbox to check so that we can be alerted by row in the serie (instead by the 'full' serie)?
If someone of the dev, grafana team can answer for a feedback...

anthosz on 18 May 2020

👍5

Would you mind to try prometheus for alerting and left grafana to do dashboards?

albdv on 18 May 2020

@beastea If you have to set up another tool just to get Grafana to work, there's no point in using Grafana. We're moving to Datadog because this functionality exists there and it's only one tool.

annedroiid on 18 May 2020

@anne-nelson you have to setup metrics gatherer, metrics storage and for the proper setup have a play with HA around it to make Grafana work, right?
Datadog is not just the one tool, it just hides it from you and do a good work, also, you still can use grafana with datadog: https://grafana.com/grafana/plugins/grafana-datadog-datasource

albdv on 18 May 2020

@beastea I'm not sure what those tools are, so I don't think we're using them. Our metrics are sent to Influx, we're just going to send them on to Datadog instead of Grafana. Why would I send things to Datadog via Grafana when I can just send them directly? I want to use the least number of tools possible.

annedroiid on 18 May 2020

@anne-nelson you can implement metrics push in your app, but sometimes this is very usefull to have some of the system metrics pushed also so that you will be able to know what's going on with your disks and other stuf. This is what I mean by metrics gatherer, some local daemon who doing such a things, like telegraf, collectd or fluentd.
Influx in your setup - is a thing that stores metrics and gives a rich ability to doing searches via grafana as a web ui frontend to the raw data that gives you a chance by using some internal influx querry language to manipulate your data.
In case of having Datadog instead of Influx, it works in exactly the same way. Grafana here -is a ui for access of the data. In a general setup. So it doesn't do anything with your data, it just presents it in graphs. So you anyway send them directly.
In case as you described you're working with inlux why you're not considered using kapacitor or flux for solving the issue you described as they provided much reacher capabilities then grafana can ever offer you and they are still from the same vendor and like the same environment. Flux is even a part of the influx shipment package.

albdv on 18 May 2020

👎5 👍1

It will be really helpful.

ykaushik1 on 18 May 2020

@beastea so probably better to remove the 'alerts' feature in grafana and migrate people's to another tool (to avoid a gaz factory of multiple tools)?
I mean, OK, we can use kapacitor, prometheus, etc.. But the alert feature already exist in Grafana so it's a non sense on my case.

Btw, what is prevent to add this checkbox to have alert by row? Probably an explanation can help to understand.

anthosz on 18 May 2020

@beastea It seems really odd that you're trying to convince someone not to use Grafana.

As anthosz pointed out as long as alerting is a feature in Grafana, it is reasonable to expect the ability to add multiple alerts to a graph. If you think we shouldn't be using Grafana for alerting, then Grafana shouldn't have alerting as a feature. It's clear that a lot of people want this feature, and that a lot of competing products already offer it. I honestly don't understand why there's so much push-back on this.

annedroiid on 19 May 2020

👍10

@anne-nelson I'm not trying to convince anyone to not doing whatever they'd like to do. I'm trying to give an advice to take a look into the different direction that could already offer you a solution today.
I'm not dictating what you should use for what, I'm offering an alternatives that could give you a solution just today. I'm not pushing back, I'm giving you an advise. If you think my advise is not helpful, so that's a pity but this is it. I'm sorry that you're feeling I'm annoying you and that I'm too pushy with my advises.
Have a good time.

albdv on 19 May 2020

👎6

@beastea I'd assumed due to your defensiveness that you worked for Grafana. This feature is relevant to a lot of people, and suggesting alternate products on a feature request is unhelpful and derailing this discussion. This isn't stackoverflow.

annedroiid on 19 May 2020

👍10

Can everyone just knock it off? You're spamming potentially hundreds of people, this is not productive.

Sorry for the additional noise all.

pdf on 19 May 2020

👍9 ❤3

@torkelo would you mind terribly giving us an update on this feature request? This topic has been open for a number of _years_ and as you can see still has interest. At the very least it may help to cut down on the squabbling and needless chatter on this topic to get some kind of "official" answer on whether this is included or not on the current road-map. Cheers.

emr-arvig on 19 May 2020

👍7 👎1

This one and #6041 that is similar are completely ignored. I wonder why.

mgiammarco on 23 May 2020

👍3

For us it makes sense, as our ops team register new integrations in our platform. We automatically start sending metrics to graphite. And only one panel in grafana watches all of this.

When multiple systems go down, we only get the alert for the first one. And not very explanatory either.

When one is down, and a second one goes out also, the alert does not fire again.

digaobarbosa on 23 Jun 2020

The use case I have for this is for defining multi window multi burn rate alerts via prometheus and grafana. This is a standard practice to have alerts of this type for monitoring of SLOs as defined in the Google SRE handbook at https://landing.google.com/sre/workbook/chapters/alerting-on-slos/

jgulick48 on 24 Jun 2020

An absolute must, please follow up on this..

allwizard on 3 Jul 2020

I have also moved from Prometheus alerting to Grafana Alerting and I absolutely am looking forward for this!

mzaferyahsi on 7 Jul 2020

Can someone who has worked on Grafana before list down the known challenges with addressing this?

prathamesh-gharat on 7 Jul 2020

👍3

Hey @torkelo, maybe you can enlighten us on this matter!

gabrielmcf on 8 Jul 2020

Disappointing to see 7.x didn't have any improvement to alerting - the previous suggestion that alerting was to be removed entirely doesn't fill me with hope, but if this was the case surely removing it in 7.x would've been logical given the scale of the revamp?

It'd be great to get some kind of update about why this is so difficult to implement, just so we can understand _why_ this issue has been open for so long.

91jme on 8 Jul 2020

👍3

@torkelo hello.
I have the same need - multiple alerts for a single metric on a single graf but with multiple servers being monitored.
I have ~100 servers with defined metric of free space at '/' partition (for example - as I have tens of such metrics). And I need to receive single unique alert notification on EACH server if free space on '/' will become less than 20%.
Currently that will not happen, if, for example server2 will throw alert, and while guys are working on solving the problem, server4 will throw the same alert - we won't get notified. Or am I missing some functionality?

The way of multiplication of panels per server per metric is not the way.
Could someone please do an advise for me, how to make this possible?
Should I upgrade my Grafana (current version is 6.3.5)? Add some extensions? Plugins? Anything else?

I thank and appreciate everyone who can advise or help.

prairiewolf-by on 9 Jul 2020

👍1

@torkelo hello.
I have the same need - multiple alerts for a single metric on a single graf but with multiple servers being monitored.
I have ~100 servers with defined metric of free space at '/' partition (for example - as I have tens of such metrics). And I need to receive single unique alert notification on EACH server if free space on '/' will become less than 20%.
Currently that will not happen, if, for example server2 will throw alert, and while guys are working on solving the problem, server4 will throw the same alert - we won't get notified. Or am I missing some functionality?

The way of multiplication of panels per server per metric is not the way.
Could someone please do an advise for me, how to make this possible?
Should I upgrade my Grafana (current version is 6.3.5)? Add some extensions? Plugins? Anything else?

I thank and appreciate everyone who can advise or help.

This issue is opened since 2017 (And the answer of @torkelo is 🤡 "it makes more sense to have separate panels for the alerts" 🤡 (very nice to create a panel by server/alert when we have 600 servers) 🤡).

Seems that the only way is to migrate from Grafana to another solution or create a gaz factory with multiple tools to maintain.

anthosz on 9 Jul 2020

👍1

@anthosz - thanks a lot. Issue is the fact of enviroment is not ours but customers', so it would be a kind of a very noneasy task for me to insist on this for my lead and in forth for him to overcome customers' "won't pay for this".
However, at least I have some facts saying 'no possibility to organize such triggers / alarms - this way'.

Thanks again.

prairiewolf-by on 10 Jul 2020

_join(voice, choir)_
I have a current sensor on a circuit monitoring an air pump, 1.5 Amps nominal and an effluent pump 10Amps nominal. The air pump runs 24/7, the effluent pump runs on demand based on tank levels. When everything is ok the current (I) is either 1.5A when the effluent pump is off or 11.5A when the effluent pump is on.

The first common failure is the air pump burns out which is alerted by (Imax < 0.5A or Iavg between 9A to 11A) which detects either no current or the effluent pump running when the air pump has died. This is must be addressed within 48h to avoid system failure. Data is 1 point per minute, alerts after 90 minutes.

The second desired alert on the same graph is (Imax > 14A or Iavg between 2A to 9A) which indicates effluent pump clogged or air-in-line when it should be pumping. This is a much more urgent alert which may need to be addressed within 3 hours so alert after 5 minutes would be ideal.

Both alerts are from the same remote current sensor sending data over LoRa. Multiple alerts would simplify keep me from having to duplicate a dashboard query for the same sensor.

Laked on 21 Jul 2020

👍1

@torkelo multiple graphs is simply not scalable for many users. This seems like such a simple thing to add and I am curious why you guys are not considering it?

johntdyer on 21 Jul 2020

👍1

maybe if there huge demand for it :)

Hey @torkelo, what do you consider as huge demand? 96 comments and 250 "likes" in your comment is huge? It's the 8th most commented open feature request and only one closed feature request has more comments than that. It is also the 3th open feature request with more :+1: reactions. What is needed to enter the roadmap?

gabrielmcf on 22 Jul 2020

👍12 🚀6

@torkelo I got a very simple case scenario.

I need a different alert if the value goes below threshold, than the alert for when the value goes over a (different) threshold.

Here is a different scenario. When I monitor healthy server count, I need different alerts when I lose 1 server (legitimate restart that is not an issues unless it takes over 10minites), versus losing 5 servers.

Here is yet another scenario. I would like to set a different alert if the rate of a increase in a queue is over a threshold, and a different alert if the queue size itself goes over a threshold.

In terms of visualisation, I believe the community would be happy with any solution to begin with. e.g. only visualise the first alert (so no UI changes needed). Visualise all alerts with vertical lines that when hovered tell you which alert got triggered. Only show thresholds/alerts when you hover over a particular series etc.

Just my 2 cents.

alianos- on 30 Jul 2020

👍3

Hello!

Wanted to chime in here we (Spotify) need this as well.

We currently run our own alerting engine sourcing alerts from Grafana, and alert per-timeseries. We currently push the per timeseries alert annotations back into grafana.

So, in terms on UI, the first timeseries to alert cause the panel/alert to go into "Alerting" state, and each subsequent alert just piles on (the state history will show multiple updates "to" alerting, and likewise, multiple changes back to "ok")

We "need" this as this is how we've always done alerting, so moving away from per-timeseries alerting would be a huge social change, for ~10K alerts. We'd very much like to use and adopt Grafana native alerting and update our datasource to support it.

sjoeboo on 18 Aug 2020

👍6

Wanted to chime in here we (Spotify) need this as well.

Did you use also Grafana enterprise? Maybe can help/motivate developers =)

anthosz on 19 Aug 2020

We would also love to see this feature , the ability to trigger multiple alerts from same graph . Giving the ability to trigger on a "below" as well as an "above" state , and have the possibility to have what would effectively be an amber warning ahead of a more important threshold breach

NigelJC67 on 26 Aug 2020

We currently run our own alerting engine sourcing alerts from Grafana, and alert per-timeseries. We currently push the per timeseries alert annotations back into grafana.

@sjoeboo a bit off-topic here but is is anything publicly available?

vbichov on 26 Aug 2020

@vbichov not yet, we do want to opensource the alerting engine, though timeframe is in flux. i'm sure i could share a patch we have on our (hardly ideal) internal fork to enable per-timesseries tracking of alerts via annotations.

a note, the alerting engine, right now, is specific to our TSDB (https://github.com/spotify/heroic)

sjoeboo on 26 Aug 2020

❤1

+1 for this feature. this is something like a warning/critical. We want to get a warning before life is getting worse. Then we should get critical alerts to take immediate action.

huihuiy02 on 28 Aug 2020

🚀1 👍1

I'm amazed that this hasn't been implemented after 3 year of requests by users.

Having to create multiple panels (one for each alert) ends up clogging a dashboard and makes adding new alerts way more complicated than it should be

paolofacchinetti on 28 Aug 2020

👍3 👎1

I always wonder why there is a 1 shown in the alerts tab if you can't define more than one alert per panel. In the query tab this number also shows the number of defined queries. So i always thought that this would be possible and i am quite surprised that this is not yet available.

kennymc-c on 1 Sep 2020

👍4

Interesting that this is still not implemented. I agree the "count" on the alert tab is misleading since it leads one to believe that there can be multiple. Also, having a panel per alert rule is a bit ridiculous, as that means I have a "useless" dashboard that is nothing but panels for alerting. It's a messy ass dashboard for sure, but it is the only way to implement this. Mainly, it's so that I can have different rules for names and/or notification endpoint combination. It's complicated to say the least.

tibmeister on 2 Sep 2020

👍10

Is this been done ?
Grafana version = 4.x

Now Grafana version goes to 7.x and I did not see this feature

winfish on 14 Sep 2020

Is this been done ?
Grafana version = 4.x

Now Grafana version goes to 7.x and I did not see this feature

So naive😁

anthosz on 14 Sep 2020

😄3

+1 for this feature.
On a single metric I would like

A warning alert to indicate that a component is not behaving as expected and need close monitoring by 2nd line support
A error alert to indicate a component is failing and trigger callouts to 3rd line engineering.
Duplicating the metric is clumsy and makes our dashboards confusing for monitoring.

paul-lemon on 18 Sep 2020

👍3

So many simple features are constantly denied by this group, check the many other feature requests..this seems like something basic.

SFBorland on 22 Sep 2020

👍2

I'll give another example.

I run a synology and would like to alert on it. Raid Status has a normal value of 1. However it also has a Degraded value of 11, and a Crashed value of 12. Degraded means data is still accessible. Crashed means high chance of data loss.

I want to send out a warning if the Raid is Degraded, and a critical alarm if the Raid is Crashed.
I have multiple volumes and storage pools and requiring multiple graphs for each is not scalable.

This can also be applied to something as simple as disk space usage.
I want to send out a warning if disk usage reaches 80%, and a critical alarm if disk usage reaches 90%. Doing multiple graphs for EACH of my disks is not a reasonable ask.

And I don't understand the comment that this is difficult in the UI. You already have something similar which is a list of Dashboards. When you click the Alert tab, it should show a list of Alert Rules by name with a "Create new Alert" button at the bottom. Each alert rule should have a "edit", "disable", or "delete" option to the right of it. By clicking on the alert, or on the edit button, it should take you to the existing edit page that is shown but for that specific alert rule.

lenaxia on 25 Sep 2020

👍1

Doing multiple graphs for EACH of my disks is not a reasonable ask.

You can use the api to automate creating/updating dashboards and their alerts. If you want to you can create a program that queries prometheus (or whatever source you have) by running queries periodically to get a service discover for targets and automatically create alerts or them.

thomasf on 25 Sep 2020

Incredible that this feature has not been implemented yet, with the huge feedback that this issue has.

I use Grafana as our visualization & alerts engine at the Magellan Telescopes. If I have several subsystems that share characteristics that merit for them to be all in 1 plot, when an issue arises and one starts behaving badly, my users have to get a cryptic warning and dig which is failing.

Creating dummy plots is a workaround, not a solution. This seems basic!

geprieto on 4 Oct 2020

👍8

+1 necessary feature

anilsagar on 8 Oct 2020

👍3

+1

danielschiotz on 11 Oct 2020

👎2

Exactly same situation as the OP. Basic feature that should´ve already been implemented.

fviero on 14 Oct 2020

👍2 👎1

Can people stop spamming this thread issue without adding anything of value?.

Use the reactions on the top of the issue to signal interest.

https://github.com/grafana/grafana/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc is infinitely more useful for a maintainer to assert which issues are "popular" than people spamming everyone e-mail inboxes and github notifications with information that's already clear by just looking at the issue description.

If it's so basic maybe someone of all the complainers who just expects other people to do work for free for them should implement this themselves and either make a pull request or maintain their own fork if the maintainers don't want it in upstream.

thomasf on 14 Oct 2020

👎5 👍1

@thomasf "Can people stop spamming this thread issue without adding anything of value?" - Just like you did?

fviero on 14 Oct 2020

👎3 👍2

why not both
If the maintainers are still in the thread, new comments at least remind them of it. At this point is does seem kind of useless, there's no way the maintainers are going to implement it after this long and people should really move to better tools like Datadog where the maintainers actually care, but hundreds of comments (particularly when they have actual scenarios) has a lot more of an impact than just a thumbs up.

annedroiid on 14 Oct 2020

👍4

If the maintainers are still in the thread, new comments at least remind them of it. At this point is does seem kind of useless, there's no way the maintainers are going to implement it after this long and people should really move to better tools like Datadog where the maintainers actually care, but hundreds of comments (particularly when they have actual scenarios) has a lot more of an impact than just a thumbs up.

Or maybe the maintainers have unsubscribe from notification on this issue because of the spam, that not the only one with a lot of +1/message without update. Please don't compare Grafana and DataDog (we was users from both, no way to go back to DataDog)

The best way to got this one is to contribute (or probably pay for Grafana Entreprise)

Eraac on 14 Oct 2020

👎1

You are very very wrong. Free or not you cannot put a
forum/slack/github/feedback channel and then ignore it. If you think that
putting a software on opensource license means "no complaints" and "people
will develop for your features for free" you are again very very wrong. In
my case I explained them that with these feature I can sell grafana to ten
customers of mine. The ignored me, it means the pissed on a customer. Great
move probably they do "enough" money and they do not want more, I am happy
for them....

Il giorno mer 14 ott 2020 alle ore 15:35 Thomas Frössman <
[email protected]> ha scritto:

Can people stop spamming this thread issue without adding anything of
value?.

Use the reactions on the top of the issue to signal interest.

If it's so basic maybe someone of all the complainers who just expects
other people to do work for free for them should implement this themselves
and either make a pull request or maintain their own fork if the
maintainers don't want it in upstream.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/grafana/grafana/issues/7832#issuecomment-708406018,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABBIFUYLMIO4WH7LBYQ6FTSKWSLXANCNFSM4DDVAQPQ
.

mgiammarco on 14 Oct 2020

👍8 ❤1

The amount of money I am willing to spend on _any_ software is directly proportional to the level of customer service I can anticipate I will be provided for my investment. Whether that be an open source product offering "paid support" or commercial product it does not really matter.

Having this issue remain open so long without so much as a peep from the maintainers of the project unfortunately extrapolates to a reasonable feeling of doubt on whether anything would be any different by spending money. If you are trying to sell software, it's probably wise to consider this.

make a pull request or maintain their own fork

If there was even a hint from the developers on where to begin, I am sure I am not alone in saying I would consider this, regardless of whether I think I should have to or not, simply due to the sheer amount of value it would provide. Sadly, that does not seem to be the case and I have little interest in trying to reverse engineer the product for a feature that the maintainers appear to not really care about.

Lastly, unless the thread is closed/locked, I see no reason for one to not speak their mind. You are allowed to unsubscribe if that does not suit you. I actually enjoy reading people lamenting about the relative absurdity of this. 😁

emr-arvig on 14 Oct 2020

👍6

Alerting NG (NextGen) alerting planned for 8 will support multiple alert instances from a single alert definition. So, something like host=* with a system like prometheus will create alerts per host.

Some general information about this in the context of single stats added to https://github.com/grafana/grafana/issues/6983#issuecomment-712915673

We are still designing and prototyping, but to respond to some initial thoughts on the things:

Multiple alerts per graph

Alert definition will be their own entities, so they will not be tied to a panel. From alert definitions can become multiple alert instances. Then a panel can subscribe to instances or definitions. I imagine we will still want a nice UX path from Dashboard panel to create alert though, because that is a nice flow.

Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

Once many alerts from one definition are allowed, then how they should be grouped becomes an issue (as one can get to many alerts). I currently see two paths for how this would work with Alerting NG:

Use alerting NG wtih an IRM like pagerduty or alertmanager that can handle grouping of alert instances.
Change your query to group by a larger scoping dimension. So for example, if query cluster=* instead of host=*,cluster=* (or group by for sql like datasources). Alternatively, I intend to add functionality to server side expressions (coming with alerting ng) to allow for group/by pivot operations if the data source does not do this. This would be the case when not using an IRM and sending directly to services like email/slack.

kylebrandt on 20 Oct 2020

👍15 🎉7

warning/critical

This one is more complicated. For the WIP design I have removed it as a feature (at least for an alert definition, will maybe have a way to duplicate the alert definition, change it, and some how label/tag it with severity)

This is tough, because in many cases, it is very useful:

To me, warning/critical have clear uses: approaching broken / broken, or degraded / broken.
Without them, many setups will end up repeating a fair amount of alerts for different severity levels.

So why decide not to have them? It adds quite a bit of non-obvious complexity:

Assuming you want to support your thresholds coming from another metric (or your thresholds to be different ranges of query time, not values), there are now two conditions to run.
For the states of alert instances, at minimum I want to support:
- Unknown: An instance disappeared
- Error: The query that would have found out of there is a problem about the instances is broken
- Alerting: The condition is true
- Normal. The condition is not true
We also want to continue to have FOR like expressions. When adding more states, designing not have flapping either result missed notifications or noise is complicated. In general, state machines over time are very prone to bugs and are difficult to get right (search TLA / Temporal Logic of actions to learn more if you like that sort of thing). So adding severity levels increases the state space more than one would guess. Which means we are more likely to have unintended behaviors, or behaviors that harder to have a mental model for.
When looking to integrate with other system or IRMs, having specific notions on severity could complicate the integration.

kylebrandt on 20 Oct 2020

👍7 🎉2

(at least for an alert definition, will maybe have a way to duplicate the alert definition, change it, and some how label/tag it with severity

This is a perfectly acceptable workaround for the critical/warning differentiation. I am more than happy to maintain separate thresholds. Having a combined warning/critical threshold would be a nice to have, but is not a dealbreaker.

then how they should be grouped becomes an issue (as one can get to many alerts)

It is up to the user to manage their own ticket volume and alarm generation. If you're setting alarms each one should be a separate email or notification. Think of it this way, if you create an automated system to generate tickets based on alarms triggering, grouping multiple alarms into one email, for instance, would make this either difficult or just obnoxious. Further, multiple alarms showing up into one email means each alarm cannot have its own email thread, it would need to manually be separated out by the users and new threads kicked off. Instead each alarm triggering should have its own notification so that threads can be contained to that specific alarm.

Hopefully this simplifies the alarming design as you shouldn't be worrying about grouping. That is up to the user to handle.

lenaxia on 22 Oct 2020

👍8

Grafana: [Feature request] Multiple alerts per graph

Most helpful comment

All 126 comments

Related issues