Grafana: [Feature request] Multiple alerts per graph

Created on 14 Mar 2017  ·  126Comments  ·  Source: grafana/grafana

As per http://docs.grafana.org/alerting/rules/, Grafana plans to track state per series in future releases.

  • "If a query returns multiple series then the aggregation function and threshold check will be evaluated for each series. What Grafana does not do currently is track alert rule state per series." and
  • "To improve support for queries that return multiple series we plan to track state per series in a future release"

But it seems like there can be use cases where we have graphs containing set of metrics for which different sets of alerts are required. This is slightly different from "Support per series state change" ( https://github.com/grafana/grafana/issues/6041 ) because

  1. The action (notifications) can be different.
  2. Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

Grafana version = 4.x

arealerting typfeature-request

Most helpful comment

maybe if there huge demand for it :)

All 126 comments

Concrete use case: I have instrumented my app to record a histogram in Prometheus for each major function (e.g. where an external HTTP call or disk I/O takes place) and would like to alert when any of these becomes slow.

Presently I have to define dummy graphs for this because of the 1:1 relationship between graph and alert. It would be much more logical to keep the alerts defined in the same place as the graph itself.

And you cannot defined that in one query?

No; a chain of OR conditions is crude, and the single name of the Alert can not clearly identify the exact reason for the alert. I definitely don't want to send alerts along the lines of Some part of service X is failing - engineers on call would not be my friends...

then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc.

Yep that's exactly what I'm doing at the moment. Is there any likelihood of implementing multiple alerts per graph in the near future so I can move away from this workaround?

it's very unlikely

maybe if there huge demand for it :)

haha OK - I'll see if I can rustle up an angry mob ;) Seriously tho', thanks for the honesty.

Ok we have a mob of two :-) I'm graphing fuel levels in multiple tanks & wanted to set up a low fuel alert for each tank.

and each tank has different thresholds or notifications ?

Exactly. One is a 285 gal heating oil tank. I wanted to set up an "heating oil low" alert when that tank goes below 70gals. The other is a 500 gal propane tank, for that I wanted a "propane low" alert when it goes under 100 gal. I set up singlestats for each but alerts are not available in a singlestat.

fuellevels

I have a graph with a median and a 90th percentile metric. I'd like to get an alert on each. In order to do this, I have to create one graph for each. Then, if I want warnings and critical alerts for each, I have to create a second graph for each.

I have 30 or 40 services to monitor, each with 2 to 5 key metrics. I have graphs where I graph the same metric for multiple customers, and while I don't have to do alerts per customer (yet), it does add to the number of metrics I'd like to have alerts on. The amount of work to create dozens of graphs expands very quickly. It would be very useful in my current production environment (and in my previous production environments) to have warnings and critical alerts, and to display multiple metrics in a single graph and alert on them.

I'd also like to see this feature. A good example is one alert if a metric goes outside of a threshold and another alert if data fails to update. Ii.e., if a value goes too high or if values fail to report. This could be used to show that whatever is reporting the data has encountered an issue that is preventing communication with grafana (or whatever backend).

Hi Torkelo!

I got several "likes" for the feature! Will we enter the next realease =) ?

@rmsys maybe at some point, solving it from UX perspective & code complexity (and UX complexity) perspective will take time, it's not on any roadmap yet, but maybe next year as the alerting engine mature more and a UX design for this is worked out

Another good use case for multiple alerts is to have different severity thresholds with different actions. If a server starts to exhibit slowdowns, an email might be sufficient, but if the slowdowns become extreme, it might be worth paging the administrator.

I have a graph that returns a metric with the value of valid and invalid. This would be useful to me because I could use a single graph containing two queries to create alerts that fire when valid's are too low and invalid's are too high.

Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

Not sure I understand what you mean by this. Can you elaborate?

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

Grafana is a visual tool and we have chosen to tie an alert rule to a graph so that the alert rule state can visualized easily (via the metrics, thresholds & alert state history). I am afraid that having each graph be able to represent multiple alert rules will complicate this to a very large extent and I am not sure about the need for this.

@rssalerno having support for alert rules in singlestat panel seems unrelated to this issue.

@alex-phillips you scenario sounds like it can be solved by making individual alert rules more flexible.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

I think multiple alert rules would be annotated individually. Hearts might be colour-coded. Rules would need to be named for differentiation in alerts/panels.

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

Generally I would think not, though I suspect groups would need to have a shared threshold, and name if they were implemented (per https://github.com/grafana/grafana/issues/6557#issuecomment-324363795).

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

If rules take an additional colour param, thresholds can be rendered using that, and diffentiated as such, probably want a tooltip also. Being able to toggle rules would be useful, and a param to render a specific rule takes care of the latter I think?

@rssalerno having support for alert rules in singlestat panel seems unrelated to this issue.

I believe you'll find he was referring to the graph below that, though since he has separate panels for each tank, singlestat alerting may solve his problem for that specific dashboard.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

Primarily, I'd like this to support #6557 and #6553, and for multiple thresholds, similar to @alex-phillips. For example, one use-case we have for #6557 is to alert differently for different environments (production, beta, dev, etc), combined with multiple thresholds that would solve most of our problems. If there's a better way of doing that without multiple rules, it's not obvious to me.

@torkelo

Can you describe how multiple alerts per graph would work and look? What would the annotations say, and the green/red heart beside the panel title show(if say 2/5 alert rules where firing)?

I like the approach suggested by @pdf

Further, the approach to show annotations would be the same as the current case, where you have an alert rule with > 1 conditions (each having a different threshold). And the green/red heart beside the panel title would be shown as red (if there is atleast one alert which is firing), similar to current scenario where at-least one condition of an alert rule evaluating to true). And probably also show the number (2/5) along with the red heart in the title.

Would you want to share something between the alert rules or would they be completely isolated (Beside living in the same graph panel and possibly referring to the same queries).

In most of our use cases, these rules would not share anything between them and the queries are also different

How would you visualize thresholds when you have multiple alert rules? Would they show up as separate rules in alert rules page & alert list panel? Then you need a way to navigate to a specific instance of a rule and not just to the alert tab.

They would show up as separate rules in alerts page. The Alert tab, would probably have a list of alerts defined. Right, we would need to highlight/expand the specific alert rule on this tab, when the alret rule url (should capture the alert id or index) is accessed from the notification. Seems to be easily solvable.

In the alert list panel, there wouldn't be any change. It shows all of them separately. Semantically, each alert is separate. Just that it has been placed in the same panel.

Does someone have some concrete examples where this would be good? Just not seeing a scenario where it would end up in a confusing graph with 2-5 thresholds that you do not know relates to what metric and alert history annotations that you also do no know what alert rule they came from (without hovering).

Considering that a lot of people have upvoted for this feature, it would definitely be a useful feature. If we have the support for multiple alerts, then I think it would be upto each user's perception whether it's confusing or not. IMHO, those who think it is confusing would go with the current approach of separate panels for each graph and for those who think the utility/convenience of having the same panel used for visualization and alerting outweighs the perceived confusion, will go the multiple alerts way. Sure it would change the UX somewhat

In splunk we have high/low alerts. If multiple alerts in grafana available, we'd just use the same search, they are just different thresholds against the same search.

+1 for this feature.

+1 for this. Our use case is as follows: We want to define one chart with, say, cpu usage for all of our servers. Then on that same chart we will make two hidden metrics, one for cpu usage on production servers and one for cpu usage on non-production servers. Each of those metrics would have its own alert, with different notification channels. We do not want to have to create multiple charts or panels or dashboards to accomplish this.

+1 for this feature.

Came here reading some of the other issues regarding categories and severities. I agree all alerts should be actionable. But there is a difference between a "fix this first thing in the morning" alert and a "call out the $400/hour consultant ASAP" alert.

As many have mentioned, this is most common solved by Warning and Critical thresholds.

Technically this could be implemented in a bunch of ways, labels, several alerts per panel, several thresholds per alert etc.

Regarding confusion if the categorization is to complex, a Warning/Critical setup can simply use Red/Yellow. Red overrides Yellow.

For more complex setups, another option besides hover to locate the offending time-series could be a flashing line/area/whatever? That could draw attention to the correct time-series easily.

I think most users would be satisfied by a fairly simple Warn/Crit separation though.

This is an absolute must for an alerting software, especially for server monitoring. Disk space, memory, cpu usage, temperature, load avg.... all prime examples where one would want multiple alerts configured with different messages with different thresholds. Take disk space for example. Need one alert for disk usage over 70%, another for disk usage over 90%.

Bit of an edge case, but we are using the alerts to notify us if a product hasn't sold in a few days. We have each product as a metric, which in turn means we only get one alert when one of the metrics enters the alerting threshold. Ideally we would like to receive an alert if the alert shows any additional metric has entered the alerting threshold as well.

Also we are using templating vars to repeat a graph for each selected product with two metrics overlayed (volume and gross margin) on the left and right y axis. This kills any chance of using alerting as the alert query isn't picking up the $sku list variable for our IN ($sku).

To work around this I've tried having another query B which just runs the template query to look up all skus we are interested in and puts that straight into the alert query IN (SELECT skus from interested_product_table). However this starts sending us alerts for each graph for all the metrics across each graph meaning we get:

Email Alert 1 - metric1,metric2,metric3
Email Alert 2 - metric1,metric2,metric3
Email Alert 3 - metric1,metric2,metric3
Email Alert 4 - metric1,metric2,metric3

Email Alert 5 - metric4
Email Alert 6 - metric4
Email Alert 7 - metric4
Email Alert 8 - metric4

For example which is quite spammy.

Full agree that the feature is the must and totally disagree that ALL notifications should be actionable.

The simplest example is that you might have alerts which you get and you need to make some action as soon as possible like next morning while there are other types of alerts which should get you up even in the middle of the night to fix production servers.

Throwing in my two cents - I would love to have this feature.

I don't even need different hearts or different colored hearts (red for any alert on the graph is fine), it's the email notifications I want different names for.

Please add this feature. for a use case like this,
from a single graph
if value > X --> slack
if Value > X+Y --> PD

We have a policy here of actionable alerts, where the alert should specify the action to take if possible. We have different actions to take based on metrics being too low, or too high.

For example: RDS CPU too low? check the other stack here for behavior. Too high? Scale up the instance.

As with others, we also like to have different types of alerting at different thresholds.

Similar to @jdblack I want to have a high water warning level and a high water emergency level. I know I can do it with two queries but it’s not as intuitive or slick.

I was thinking about using Grafana as a way to signal an autoscaling system. If the metric is too low, send a webhook with a message to scale down, if it is too high, send a webhook with a message to scale out. Without multiple alerts, I do not believe this is not possible. I also agree with others in the thread that the use case for a "warning" then a "critical" threshold is common.

Perhaps the idea of coupling the alerts to a graph should be revisited? Maybe alerts should be created separately, with a nice preview graph when creating the alert. This decoupling might make it more work when changing a graph metric, but at least it would have more flexibility around making multiple alerts.

I've been trying to use Grafana + Influx for sensor networks. The dashboards work pretty well, except for alerts. I need to be alerted when Sensor123 exceeds a certain threshold. I don't need a chart for that, just an alert. Also, I need to potentially have 1000s of sensors. I can setup an alert if "any" sensor exceeds the threshold but I need to know which one(s) is/are alerting. I have dashboards setup with template variables to view a specific sensor, but I can't add an alert for a template variable. For testing, I just setup a handful of alerts for a handful of sensors in an extra dashboard that no one looks at, but moving forward I need a different solution for alerts.

@torkelo , Approaching a year since any official comment on this - just wondering if there are any updates that can be shared now that the alerting system has been in the wild for some time?

@MakoSDV you should consider using kapacitor for that use case.

+1 for this feature; it would be really useful also for two-level alerting (e.g: something > X = yellow alert, something > Y = red alert)

+1 for making the alerting more flexible

i monitor temperature graphs in a heating boiler, the low temp threshold is a trivial one and needs to go to a non-critical notification channel, but the high temp is urgent and needs to buzz via the urgent channel. Multiple alert rules would make a lot of sense here.

It's a shame that this issue appears abandoned. Does anyone know how we can get developer attention to it ?

It seems like UI-wise, it would be comparatively easy to implement alerts the way overrides are implemented, to allow one or more alerts without much UI changes.

@Gaibhne wrote:

Does anyone know how we can get developer attention to it?

Pay for support perhaps? Seems like there haven't been any resources available for any of the serious alerting-related deficiencies, though they've remained the highest Github user-rated issues for years.

+1 for this request.

We have a counter set up in our app for when a request to an external service we integrate with times out that we've created a graph for in Grafana.

If there's a couple of time outs we'd like to know so we can chase the external service up about it later, if there are a lot of timeouts it means our app is likely to have been affected for most customers so we need to respond and deal with it immediately.

+1 for this as well.

Currently trying to set up two separate alerts for a graph:

  1. Slack message for data reaching a _warning_ level
  2. Pager Duty alert for data reaching a _critical_ level

Currently to my understanding, I would have to create two separate graphs of the same data in order to accomplish this. It would make more sense to me to have multiple different alerts acting on the same graph.

@torkelo is there any update on plans for this going into 2019?

+1

We have dashboards that monitor the same microservices for multiple clients/environments using a variable to switch between the displayed environment.

Our current pain could be reduced if we were able to use variables in the alert title/text so that we can identify the client/environment, but longer term we would really like the ability to create separate alerts with differing thresholds using the same graph.

It would be great even if it required using a different query for each alert, and just setting query to not visible on graph.

What you are describing @itonlytakeswon also seems to relate to https://github.com/grafana/grafana/issues/6557 , so you might want to track that one as well :)

How is this not a feature already ?

@jsterling7 describes our desired use case perfectly.

@torkelo Any feature release

Either multiple alerts or allowing tag values in the alert title/body somewhere would solve this for our usage. We have a single graph showing a tagged metric with several independent sources and want to know which one drops below the threshold. I'm right now making the 10 separate graphs I'll need to accomplish this but it feels like a missing feature and poor for longterm maintenance on my end.

It seems that it's high demand, I'm one of them who needs this type of feature. I almost love grafana then suddenly this limitation turns me off.

My use case is similar to others referenced here and to #6557 issue. We have multiple elasticsearch clusters monitored in a single template dashboard. I would like to trigger alerts to them individually and as it is now I cannot just create a graph with the queries hardcoded but have to create a graph for each cluster, in order to have this alerts working...

+1, this would great help our environment! Even just a yellow/red 'heart' two-alert per graph setup, where if red is triggered, it overrides yellow.

+1 this would be great, wondering how trivial it would be to just allow each condition to have an optional configurable alert notification and if not for a specific condition it can fall back on a default notification message... quickest way to make it happen i think ?

+1 it would be really useful to us as well. We have lots of dashboards with templating on multiple variables, it would be great to have template substitution done on both the alert name and alert notification.

+1, IMO this should be present in every monitoring system...there are many situations when you need to identify the severity of the alert and react accordingly, which means multiple alerts with different thresholds in the same dashboard.

+1 from me too - surprised this doesn't exist already!

+1

I think this feature goes hand-in-hand with the limitation of template query support.

I've set up a few prometheus fed graphs with queries that have templates on instance, and type labels. I get around the template problem by creating invisible queries for the template values.

I would like separate alerts for each template value, but I'm limited to a single alert with a generic one-size-fits-all action+message. I can use a long OR list to alert on all my queries but this is feels crude.

An alternative is to make a separate dashboard with tons of panels that nobody looks at, just to serve as an alerting source.

Adding support for multiple alerts seems like it could potentially be first step at supporting template query alerts.

+1. This is a must have!

+1 This is extremely useful

@torkelo "then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc."

This doesn't make any sense. Requiring users to visualize the same panel multiple times just so they can send useful non-generic alert messages isn't a solution. It's a hack for something that should be a feature, and it adds noise that degrades the usefulness of the product.

@torkelo "then it makes more sense to have separate panels for the alerts, if you want sperate alert rule name & message etc."

This doesn't make any sense. Requiring users to visualize the same panel multiple times just so they can send useful non-generic alert messages isn't a solution. It's a hack for something that should be a feature, and it adds noise that degrades the usefulness of the product.

Exactly. +1 for multiple alerts per panel

In our situation, we are measuring cell voltages in batteries (16 cells per battery). We graph the 16 series on a single panel for comparison and have a different panel for each battery.

A single alert for the panel (graph) isn't too helpful. We really need the ability to set up at least one alert per cell so that the alert e-mail indicates which cell(s) is/are out of range in terms of voltage.

Since, in our case, the acceptable voltage range is the same for each cell, it would be great to be able to define an upper and and lower limit and relate individual cell ranges to those defined limits.

At the moment we have to program 16 x OR statements for the cell series, and (re)-define the limits for each cell in the process - painful to set up and a maintenance nightmare to modify.

Ideally we should also be programming warning and critical events for each of cells on the graph panel.

I think its high time that the alert structure was modified to encompass the requirements that users have identified. These requirements are commonly implemented in SCADA systems which also generate alerts. Its really just a logic engine, surely?

Any update on this? I feel like this feature is a must have for larger deployments. Especially since we'd like to have a single graph for example showing storage usage, we want an alert for 70%, 80% etc etc which shouldn't huge amounts of graphs.

I've just now stumbled upon this and I am very surprised there's no way to do this yet D:

I see here https://github.com/grafana/grafana/pull/20822#issuecomment-561047900 that this will not be implemented in the future and sounds like alerts will get pulled out of dashboards entirely.

How will this affect the dashboard json model? Can anyone speak to when there will be more news around this?

This was a much needed feature. Any update on the upcoming situation yet?

+1 for multiple alerts per panel

+1 for this feature.

This was a much needed feature. Any update on the upcoming situation yet?

Need this feature.

3 years after.. Someone can tell us why this is not implemented (despite the number of requests)?
It's due to a technical constraint to implement it? It's rejected? It's in to-do?
Like say previously, seems a 'basic feature'.
Example: I have a dashboard and a serie with 200 servers, if I add an alert:
One of 200 servers is dead: cool I receive the alert with name
Oups, a new server is dead: no alert (or need to refresh dashboard or wait the reminder 24h after..)
This is not possible to add like a checkbox to check so that we can be alerted by row in the serie (instead by the 'full' serie)?
If someone of the dev, grafana team can answer for a feedback...

Would you mind to try prometheus for alerting and left grafana to do dashboards?

@beastea If you have to set up another tool just to get Grafana to work, there's no point in using Grafana. We're moving to Datadog because this functionality exists there and it's only one tool.

@anne-nelson you have to setup metrics gatherer, metrics storage and for the proper setup have a play with HA around it to make Grafana work, right?
Datadog is not just the one tool, it just hides it from you and do a good work, also, you still can use grafana with datadog: https://grafana.com/grafana/plugins/grafana-datadog-datasource

@beastea I'm not sure what those tools are, so I don't think we're using them. Our metrics are sent to Influx, we're just going to send them on to Datadog instead of Grafana. Why would I send things to Datadog via Grafana when I can just send them directly? I want to use the least number of tools possible.

@anne-nelson you can implement metrics push in your app, but sometimes this is very usefull to have some of the system metrics pushed also so that you will be able to know what's going on with your disks and other stuf. This is what I mean by metrics gatherer, some local daemon who doing such a things, like telegraf, collectd or fluentd.
Influx in your setup - is a thing that stores metrics and gives a rich ability to doing searches via grafana as a web ui frontend to the raw data that gives you a chance by using some internal influx querry language to manipulate your data.
In case of having Datadog instead of Influx, it works in exactly the same way. Grafana here -is a ui for access of the data. In a general setup. So it doesn't do anything with your data, it just presents it in graphs. So you anyway send them directly.
In case as you described you're working with inlux why you're not considered using kapacitor or flux for solving the issue you described as they provided much reacher capabilities then grafana can ever offer you and they are still from the same vendor and like the same environment. Flux is even a part of the influx shipment package.

It will be really helpful.

@beastea so probably better to remove the 'alerts' feature in grafana and migrate people's to another tool (to avoid a gaz factory of multiple tools)?
I mean, OK, we can use kapacitor, prometheus, etc.. But the alert feature already exist in Grafana so it's a non sense on my case.

Btw, what is prevent to add this checkbox to have alert by row? Probably an explanation can help to understand.

@beastea It seems really odd that you're trying to convince someone not to use Grafana.

As anthosz pointed out as long as alerting is a feature in Grafana, it is reasonable to expect the ability to add multiple alerts to a graph. If you think we shouldn't be using Grafana for alerting, then Grafana shouldn't have alerting as a feature. It's clear that a lot of people want this feature, and that a lot of competing products already offer it. I honestly don't understand why there's so much push-back on this.

@anne-nelson I'm not trying to convince anyone to not doing whatever they'd like to do. I'm trying to give an advice to take a look into the different direction that could already offer you a solution today.
I'm not dictating what you should use for what, I'm offering an alternatives that could give you a solution just today. I'm not pushing back, I'm giving you an advise. If you think my advise is not helpful, so that's a pity but this is it. I'm sorry that you're feeling I'm annoying you and that I'm too pushy with my advises.
Have a good time.

@beastea I'd assumed due to your defensiveness that you worked for Grafana. This feature is relevant to a lot of people, and suggesting alternate products on a feature request is unhelpful and derailing this discussion. This isn't stackoverflow.

Can everyone just knock it off? You're spamming potentially hundreds of people, this is not productive.

Sorry for the additional noise all.

@torkelo would you mind terribly giving us an update on this feature request? This topic has been open for a number of _years_ and as you can see still has interest. At the very least it may help to cut down on the squabbling and needless chatter on this topic to get some kind of "official" answer on whether this is included or not on the current road-map. Cheers.

This one and #6041 that is similar are completely ignored. I wonder why.

For us it makes sense, as our ops team register new integrations in our platform. We automatically start sending metrics to graphite. And only one panel in grafana watches all of this.

When multiple systems go down, we only get the alert for the first one. And not very explanatory either.

When one is down, and a second one goes out also, the alert does not fire again.

The use case I have for this is for defining multi window multi burn rate alerts via prometheus and grafana. This is a standard practice to have alerts of this type for monitoring of SLOs as defined in the Google SRE handbook at https://landing.google.com/sre/workbook/chapters/alerting-on-slos/

An absolute must, please follow up on this..

I have also moved from Prometheus alerting to Grafana Alerting and I absolutely am looking forward for this!

Can someone who has worked on Grafana before list down the known challenges with addressing this?

Hey @torkelo, maybe you can enlighten us on this matter!

Disappointing to see 7.x didn't have any improvement to alerting - the previous suggestion that alerting was to be removed entirely doesn't fill me with hope, but if this was the case surely removing it in 7.x would've been logical given the scale of the revamp?

It'd be great to get some kind of update about why this is so difficult to implement, just so we can understand _why_ this issue has been open for so long.

@torkelo hello.
I have the same need - multiple alerts for a single metric on a single graf but with multiple servers being monitored.
I have ~100 servers with defined metric of free space at '/' partition (for example - as I have tens of such metrics). And I need to receive single unique alert notification on EACH server if free space on '/' will become less than 20%.
Currently that will not happen, if, for example server2 will throw alert, and while guys are working on solving the problem, server4 will throw the same alert - we won't get notified. Or am I missing some functionality?

The way of multiplication of panels per server per metric is not the way.
Could someone please do an advise for me, how to make this possible?
Should I upgrade my Grafana (current version is 6.3.5)? Add some extensions? Plugins? Anything else?

I thank and appreciate everyone who can advise or help.

@torkelo hello.
I have the same need - multiple alerts for a single metric on a single graf but with multiple servers being monitored.
I have ~100 servers with defined metric of free space at '/' partition (for example - as I have tens of such metrics). And I need to receive single unique alert notification on EACH server if free space on '/' will become less than 20%.
Currently that will not happen, if, for example server2 will throw alert, and while guys are working on solving the problem, server4 will throw the same alert - we won't get notified. Or am I missing some functionality?

The way of multiplication of panels per server per metric is not the way.
Could someone please do an advise for me, how to make this possible?
Should I upgrade my Grafana (current version is 6.3.5)? Add some extensions? Plugins? Anything else?

I thank and appreciate everyone who can advise or help.

This issue is opened since 2017 (And the answer of @torkelo is 🤡 "it makes more sense to have separate panels for the alerts" 🤡 (very nice to create a panel by server/alert when we have 600 servers) 🤡).

Seems that the only way is to migrate from Grafana to another solution or create a gaz factory with multiple tools to maintain.

@anthosz - thanks a lot. Issue is the fact of enviroment is not ours but customers', so it would be a kind of a very noneasy task for me to insist on this for my lead and in forth for him to overcome customers' "won't pay for this".
However, at least I have some facts saying 'no possibility to organize such triggers / alarms - this way'.

Thanks again.

_join(voice, choir)_
I have a current sensor on a circuit monitoring an air pump, 1.5 Amps nominal and an effluent pump 10Amps nominal. The air pump runs 24/7, the effluent pump runs on demand based on tank levels. When everything is ok the current (I) is either 1.5A when the effluent pump is off or 11.5A when the effluent pump is on.

The first common failure is the air pump burns out which is alerted by (Imax < 0.5A or Iavg between 9A to 11A) which detects either no current or the effluent pump running when the air pump has died. This is must be addressed within 48h to avoid system failure. Data is 1 point per minute, alerts after 90 minutes.

The second desired alert on the same graph is (Imax > 14A or Iavg between 2A to 9A) which indicates effluent pump clogged or air-in-line when it should be pumping. This is a much more urgent alert which may need to be addressed within 3 hours so alert after 5 minutes would be ideal.

Both alerts are from the same remote current sensor sending data over LoRa. Multiple alerts would simplify keep me from having to duplicate a dashboard query for the same sensor.

@torkelo multiple graphs is simply not scalable for many users. This seems like such a simple thing to add and I am curious why you guys are not considering it?

maybe if there huge demand for it :)

Hey @torkelo, what do you consider as huge demand? 96 comments and 250 "likes" in your comment is huge? It's the 8th most commented open feature request and only one closed feature request has more comments than that. It is also the 3th open feature request with more :+1: reactions. What is needed to enter the roadmap?

@torkelo I got a very simple case scenario.

I need a different alert if the value goes below threshold, than the alert for when the value goes over a (different) threshold.

Here is a different scenario. When I monitor healthy server count, I need different alerts when I lose 1 server (legitimate restart that is not an issues unless it takes over 10minites), versus losing 5 servers.

Here is yet another scenario. I would like to set a different alert if the rate of a increase in a queue is over a threshold, and a different alert if the queue size itself goes over a threshold.

In terms of visualisation, I believe the community would be happy with any solution to begin with. e.g. only visualise the first alert (so no UI changes needed). Visualise all alerts with vertical lines that when hovered tell you which alert got triggered. Only show thresholds/alerts when you hover over a particular series etc.

Just my 2 cents.

Hello!

Wanted to chime in here we (Spotify) need this as well.

We currently run our own alerting engine sourcing alerts from Grafana, and alert per-timeseries. We currently push the per timeseries alert annotations back into grafana.

So, in terms on UI, the first timeseries to alert cause the panel/alert to go into "Alerting" state, and each subsequent alert just piles on (the state history will show multiple updates "to" alerting, and likewise, multiple changes back to "ok")

We "need" this as this is how we've always done alerting, so moving away from per-timeseries alerting would be a huge social change, for ~10K alerts. We'd very much like to use and adopt Grafana native alerting and update our datasource to support it.

Wanted to chime in here we (Spotify) need this as well.

Did you use also Grafana enterprise? Maybe can help/motivate developers =)

We would also love to see this feature , the ability to trigger multiple alerts from same graph . Giving the ability to trigger on a "below" as well as an "above" state , and have the possibility to have what would effectively be an amber warning ahead of a more important threshold breach

We currently run our own alerting engine sourcing alerts from Grafana, and alert per-timeseries. We currently push the per timeseries alert annotations back into grafana.

@sjoeboo a bit off-topic here but is is anything publicly available?

@vbichov not yet, we do want to opensource the alerting engine, though timeframe is in flux. i'm sure i could share a patch we have on our (hardly ideal) internal fork to enable per-timesseries tracking of alerts via annotations.

a note, the alerting engine, right now, is specific to our TSDB (https://github.com/spotify/heroic)

+1 for this feature. this is something like a warning/critical. We want to get a warning before life is getting worse. Then we should get critical alerts to take immediate action.

I'm amazed that this hasn't been implemented after 3 year of requests by users.

Having to create multiple panels (one for each alert) ends up clogging a dashboard and makes adding new alerts way more complicated than it should be

I always wonder why there is a 1 shown in the alerts tab if you can't define more than one alert per panel. In the query tab this number also shows the number of defined queries. So i always thought that this would be possible and i am quite surprised that this is not yet available.

Interesting that this is still not implemented. I agree the "count" on the alert tab is misleading since it leads one to believe that there can be multiple. Also, having a panel per alert rule is a bit ridiculous, as that means I have a "useless" dashboard that is nothing but panels for alerting. It's a messy ass dashboard for sure, but it is the only way to implement this. Mainly, it's so that I can have different rules for names and/or notification endpoint combination. It's complicated to say the least.

Is this been done ?
Grafana version = 4.x

Now Grafana version goes to 7.x and I did not see this feature

Is this been done ?
Grafana version = 4.x

Now Grafana version goes to 7.x and I did not see this feature

So naive😁

+1 for this feature.
On a single metric I would like

  1. A warning alert to indicate that a component is not behaving as expected and need close monitoring by 2nd line support
  2. A error alert to indicate a component is failing and trigger callouts to 3rd line engineering.
    Duplicating the metric is clumsy and makes our dashboards confusing for monitoring.

So many simple features are constantly denied by this group, check the many other feature requests..this seems like something basic.

I'll give another example.

I run a synology and would like to alert on it. Raid Status has a normal value of 1. However it also has a Degraded value of 11, and a Crashed value of 12. Degraded means data is still accessible. Crashed means high chance of data loss.

I want to send out a warning if the Raid is Degraded, and a critical alarm if the Raid is Crashed.
I have multiple volumes and storage pools and requiring multiple graphs for each is not scalable.

This can also be applied to something as simple as disk space usage.
I want to send out a warning if disk usage reaches 80%, and a critical alarm if disk usage reaches 90%. Doing multiple graphs for EACH of my disks is not a reasonable ask.

And I don't understand the comment that this is difficult in the UI. You already have something similar which is a list of Dashboards. When you click the Alert tab, it should show a list of Alert Rules by name with a "Create new Alert" button at the bottom. Each alert rule should have a "edit", "disable", or "delete" option to the right of it. By clicking on the alert, or on the edit button, it should take you to the existing edit page that is shown but for that specific alert rule.

Doing multiple graphs for EACH of my disks is not a reasonable ask.

You can use the api to automate creating/updating dashboards and their alerts. If you want to you can create a program that queries prometheus (or whatever source you have) by running queries periodically to get a service discover for targets and automatically create alerts or them.

Incredible that this feature has not been implemented yet, with the huge feedback that this issue has.

I use Grafana as our visualization & alerts engine at the Magellan Telescopes. If I have several subsystems that share characteristics that merit for them to be all in 1 plot, when an issue arises and one starts behaving badly, my users have to get a cryptic warning and dig which is failing.

Creating dummy plots is a workaround, not a solution. This seems basic!

+1 necessary feature

+1

Exactly same situation as the OP. Basic feature that should´ve already been implemented.

Can people stop spamming this thread issue without adding anything of value?.

Use the reactions on the top of the issue to signal interest.

https://github.com/grafana/grafana/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc is infinitely more useful for a maintainer to assert which issues are "popular" than people spamming everyone e-mail inboxes and github notifications with information that's already clear by just looking at the issue description.

If it's so basic maybe someone of all the complainers who just expects other people to do work for free for them should implement this themselves and either make a pull request or maintain their own fork if the maintainers don't want it in upstream.

@thomasf "Can people stop spamming this thread issue without adding anything of value?" - Just like you did?

why not both
If the maintainers are still in the thread, new comments at least remind them of it. At this point is does seem kind of useless, there's no way the maintainers are going to implement it after this long and people should really move to better tools like Datadog where the maintainers actually care, but hundreds of comments (particularly when they have actual scenarios) has a lot more of an impact than just a thumbs up.

If the maintainers are still in the thread, new comments at least remind them of it. At this point is does seem kind of useless, there's no way the maintainers are going to implement it after this long and people should really move to better tools like Datadog where the maintainers actually care, but hundreds of comments (particularly when they have actual scenarios) has a lot more of an impact than just a thumbs up.

Or maybe the maintainers have unsubscribe from notification on this issue because of the spam, that not the only one with a lot of +1/message without update. Please don't compare Grafana and DataDog (we was users from both, no way to go back to DataDog)

The best way to got this one is to contribute (or probably pay for Grafana Entreprise)

You are very very wrong. Free or not you cannot put a
forum/slack/github/feedback channel and then ignore it. If you think that
putting a software on opensource license means "no complaints" and "people
will develop for your features for free" you are again very very wrong. In
my case I explained them that with these feature I can sell grafana to ten
customers of mine. The ignored me, it means the pissed on a customer. Great
move probably they do "enough" money and they do not want more, I am happy
for them....

Il giorno mer 14 ott 2020 alle ore 15:35 Thomas Frössman <
[email protected]> ha scritto:

Can people stop spamming this thread issue without adding anything of
value?.

Use the reactions on the top of the issue to signal interest.

If it's so basic maybe someone of all the complainers who just expects
other people to do work for free for them should implement this themselves
and either make a pull request or maintain their own fork if the
maintainers don't want it in upstream.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/grafana/grafana/issues/7832#issuecomment-708406018,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABBIFUYLMIO4WH7LBYQ6FTSKWSLXANCNFSM4DDVAQPQ
.

The amount of money I am willing to spend on _any_ software is directly proportional to the level of customer service I can anticipate I will be provided for my investment. Whether that be an open source product offering "paid support" or commercial product it does not really matter.

Having this issue remain open so long without so much as a peep from the maintainers of the project unfortunately extrapolates to a reasonable feeling of doubt on whether anything would be any different by spending money. If you are trying to sell software, it's probably wise to consider this.

make a pull request or maintain their own fork

If there was even a hint from the developers on where to begin, I am sure I am not alone in saying I would consider this, regardless of whether I think I should have to or not, simply due to the sheer amount of value it would provide. Sadly, that does not seem to be the case and I have little interest in trying to reverse engineer the product for a feature that the maintainers appear to not really care about.

Lastly, unless the thread is closed/locked, I see no reason for one to not speak their mind. You are allowed to unsubscribe if that does not suit you. I actually enjoy reading people lamenting about the relative absurdity of this. 😁

Alerting NG (NextGen) alerting planned for 8 will support multiple alert instances from a single alert definition. So, something like host=* with a system like prometheus will create alerts per host.

Some general information about this in the context of single stats added to https://github.com/grafana/grafana/issues/6983#issuecomment-712915673

We are still designing and prototyping, but to respond to some initial thoughts on the things:

Multiple alerts per graph

Alert definition will be their own entities, so they will not be tied to a panel. From alert definitions can become multiple alert instances. Then a panel can subscribe to instances or definitions. I imagine we will still want a nice UX path from Dashboard panel to create alert though, because that is a nice flow.

Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

Once many alerts from one definition are allowed, then how they should be grouped becomes an issue (as one can get to many alerts). I currently see two paths for how this would work with Alerting NG:

  1. Use alerting NG wtih an IRM like pagerduty or alertmanager that can handle grouping of alert instances.
  2. Change your query to group by a larger scoping dimension. So for example, if query cluster=* instead of host=*,cluster=* (or group by for sql like datasources). Alternatively, I intend to add functionality to server side expressions (coming with alerting ng) to allow for group/by pivot operations if the data source does not do this. This would be the case when not using an IRM and sending directly to services like email/slack.

warning/critical

This one is more complicated. For the WIP design I have removed it as a feature (at least for an alert definition, will maybe have a way to duplicate the alert definition, change it, and some how label/tag it with severity)

This is tough, because in many cases, it is very useful:

  • To me, warning/critical have clear uses: approaching broken / broken, or degraded / broken.
  • Without them, many setups will end up repeating a fair amount of alerts for different severity levels.

So why decide not to have them? It adds quite a bit of non-obvious complexity:

  • Assuming you want to support your thresholds coming from another metric (or your thresholds to be different ranges of query time, not values), there are now two conditions to run.
  • For the states of alert instances, at minimum I want to support:

    • Unknown: An instance disappeared

    • Error: The query that would have found out of there is a problem about the instances is broken

    • Alerting: The condition is true

    • Normal. The condition is not true

  • We also want to continue to have FOR like expressions. When adding more states, designing not have flapping either result missed notifications or noise is complicated. In general, state machines over time are very prone to bugs and are difficult to get right (search TLA / Temporal Logic of actions to learn more if you like that sort of thing). So adding severity levels increases the state space more than one would guess. Which means we are more likely to have unintended behaviors, or behaviors that harder to have a mental model for.
  • When looking to integrate with other system or IRMs, having specific notions on severity could complicate the integration.

(at least for an alert definition, will maybe have a way to duplicate the alert definition, change it, and some how label/tag it with severity

This is a perfectly acceptable workaround for the critical/warning differentiation. I am more than happy to maintain separate thresholds. Having a combined warning/critical threshold would be a nice to have, but is not a dealbreaker.

then how they should be grouped becomes an issue (as one can get to many alerts)

It is up to the user to manage their own ticket volume and alarm generation. If you're setting alarms each one should be a separate email or notification. Think of it this way, if you create an automated system to generate tickets based on alarms triggering, grouping multiple alarms into one email, for instance, would make this either difficult or just obnoxious. Further, multiple alarms showing up into one email means each alarm cannot have its own email thread, it would need to manually be separated out by the users and new threads kicked off. Instead each alarm triggering should have its own notification so that threads can be contained to that specific alarm.

Hopefully this simplifies the alarming design as you shouldn't be worrying about grouping. That is up to the user to handle.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ahmetkakici picture ahmetkakici  ·  3Comments

kcajf picture kcajf  ·  3Comments

jackmeagher picture jackmeagher  ·  3Comments

ricardclau picture ricardclau  ·  3Comments

ericuldall picture ericuldall  ·  3Comments