Are you seeing Website Monitoring 'false positive' alerts?

Last Updated January 7, 2014

The term 'false positive' comes from statistics, and its meaning in that context is well-defined. When speaking of monitoring websites, people generally mean 'false alarm' when they say 'false positive.'  For the purposes of this article, let's keep things simple and call an alert 'false' when it occurs in an unexpected situation. The most common (and irritating) example is when you get an alert that says your website is down, when in fact it is up and running. 

You can choose to be notified on health condition, uptime, response time and status return code. These are available to help better define the set of custom alerts that is right for your business.  You can create and edit as many as you need to give you the best warning system.  To get you started, when you first install CopperEgg, a number of alert conditions are already defined and enabled, including four out-of-the box alerts for website monitoring:

  • Poor Health (Health < 50% for 3 minutes) - will trigger when the average health across all probing stations is less than 50% for the configured duration. For more info on how health is calculated see this article
  • Slow Response Time (Response Time > 6000ms for 1 minute) -  will trigger when the average response time across all probing stations is more than 6 seconds for the configured duration
  • Error Status Code (Status Code =~ ^[45]) - will trigger when any single GET or POST returns a status code between 400 and 599
  • Site Down (Uptime = 0 for 3 minutes) - will trigger when any probing stations are reporting problems reaching your website for the configured duration, in this case 3 minutes. Problems include failure to connect, no response for at least 10 seconds or a mismatch in expected content

By default, these alerts apply to all of the probes that you create (Match Anything), and will generate email notifications to the email address you used to sign-up. We encourage you to configure these alerts for your own environment. 

 

These are the most common scenarios that people see false alerts:

  • Monitoring from a single station - In this scenario transient issues with a single probing station can trigger alerts. We recommend monitoring from more than one location.
  • Setting the alert duration too low - If you are monitoring a website every 60 seconds and alerting when a condition exists for 1 minute, a single bad result can cause an alert. We recommend using an alert duration of at least 3X the probe frequency.

 

 If the above does not help, you may wish to run this command on a linux system:

  • while true; do echo "`date`: `curl -s -m 10 -o /tmp/o.txt -w ' code %{http_code};  time %{time_total}s' 'http://www.google.com'`"; sleep 1;done

And leave it running for a few minutes.  It will print a datestamp, http response cod, and the time to complete the http request.  If you see times of 10 seconds, then the command timed out.

 

Please let us know if you are having issues with alerts of any kind! 

Powered by Zendesk