Know When to Alert on What Matters
It is extremely important to set up automated alerts to assist monitoring. The biggest benefit of automated alerts is to enable spotting problems anywhere in your infrastructure so that you can easily determine the cause and minimize service degradation and disruption. But do you think alerts are reliable? Certainly not. You could miss out real problems in the sea of noisy alarms. If you are looking for effective ways for alerting, this article is for you. In this write-up, we will be explaining a simple yet effective approach to alerting.
In simple terms, an alert should communicate specific about your system. With the help of automatic alerting, you can respond quickly to issues and provide your customers better service. It also helps you save time by freeing from manual inspection of metrics.
Alerts are created on varied degrees of urgencies. Some alerts may require immediate human intervention, on the other hand, others may point to the areas that may require attention in the future. It is advisable to manage all alerts at a sole destination because it will be easy to correlate with other metrics and events.
Many alerts are not associated with a major problem. For example, when a data store starts serving queries much slower, it should generate a low-urgency alert. This low-urgency alert is recorded in the monitoring system for future reference or investigation. The alert-based data will provide invaluable context for your investigation.
Notification Alerts of Moderate Severity
Alerts of moderate severity require intervention. If the data store is running low on disk space, it could be scaled out in the next several days. Sending an email in the service owner’s chat room is a perfect way to deliver the alerts.
High Severity Alerts
High severity alerts are given special treatment. They require immediate human attention because failing to do so can worsen the problem. For example, if the response time for your web application is exceeding your internal SLA, immediate action should be taken.
When to create alerts?
Before you set up an alert, it is important to ask yourself several questions to determine the urgency level of the alert. The most common questions to ask yourself are given below:
Is it real?
The first and foremost thing you need to do is, know if the issue is real. Always generate an alert for the real issue. It is upto you if you want to set up notifications but always record alerts within your monitoring system for later correlation and analysis.
Does it require attention?
Another important question you need to ask yourself is if it requires your attention. It can be problematic calling someone from work or sleep to fix the problem. If the issue is real and it requires attention, you can generate an alert to notify someone who can investigate and fix the problem. The notification should be sent via email, chat or ticketing system to help recipients prioritize their response.
Is it Urgent?
Not all issues are urgent. Therefore, you should understand if the issues are urgent to handle. If a key system stops performing, you should for an engineer immediately.
Pages on Symptoms
Pages on symptoms are effective for delivering useful information, but these symptoms could be disruptive if overused. If some kind of errors or symptoms are being depicted through page symptoms, you should know what to do immediately. The fact your system has stopped is itself a symptom and it can have a number of causes. Let’s say for example- if the loading speed of your website is quite low, it is a symptom. The possible cause of this symptom could be – high latency database latency, high load, failed application servers, etc.
Pages on symptoms are real and can cause internal problems. No matter how underlying system architectures may change, you will get an appropriate page even without updating your alert definitions. It is crucial to call human attention to a handful of metrics when the system performs adequately. Early warning metrics show the unacceptably high probability that serious symptoms will be developed and may require immediate intervention.
The Verdict
That was everything you need to know about alerting, types of alerting and how to create alerts to handle the complexities. Every issue or problem exhibits several kinds of symptoms. It is important to pay attention to the symptoms to ensure the problem does not affect the overall performance.