Maintaining sanity on the Cloud: Testing NodeQuery linux server monitoring in public beta phase

I have been trying out NodeQuery monitorning service for linux servers for the past few months and I am finding it fairly intuitive and simple to use. Their web interface is well laid out and easily navigable. However, I am not sure if the uptime metrics they report is accurate. The reason is I got an email alert today morning about a particular server being not reachable:-

**********

From: NodeQuery <hello@nodequery.com>Date: Sat, Aug 2, 2014 at 10:13 AM
Subject: [ALERT] server A is not responding
To: <user@example.com>

Hello user,

it seems one of your servers is not responding anymore.

Server: server A
Last Update: 2014-08-02 10:12:12
Alert Trigger: 2014-08-02 10:12:01

If you don't want to receive alerts anymore, log into your account and edit the
notification settings for your server.

Feel free to reply to this message if you are experiencing problems with our
services.

Thanks,
NodeQuery.com
**********

I logged into the server and was able to verify that the server was indeed up at that time interval. Consequently, the uptime reported on NodeQuery dashboard was 99.88%. This percentage translates to approximately 50 mins of downtime per month or 9.5 hours of downtime per year. I am pretty certain that this server was not down for that long. So it looks like even temporary network glitches could be the bane of agent based monitoring systems, where agent fails to communicate with server and uptime reports tend to get skewed.

However, I like their single command to uninstall their agent:-

$sudo rm -R /etc/nodequery && (crontab -u nodequery -l | grep -v "/etc/nodequery/nq-agent.sh") | crontab -u nodequery - && userdel nodequery

Maintaining sanity on the Cloud

Saturday, August 2, 2014

Testing NodeQuery linux server monitoring in public beta phase

No comments:

Post a Comment