Check your teams' work for errors automatically!

Checking a system that isn't your own

So one of the neat things about being a programmer is that I can write scripts and other things that help check my work. When I have something that is checking my errors it can also be applied to a team setting as well.

I have been working with a client that has some software that has a very intensive process to setup a customer. If you miss a setting or don't have something filled out it can really slow down the on boarding process for the customer.

Did I also mention that the source code of the system used for the setup is not actually available to my client? They are not able to add validations, checks and balances to the setup system without a complete rewrite.

The company that created the setup software doesn't have a lot of documentation and it is very difficult to get straight answers on a lot of my questions. When our team does figure things out we document it as best we can (we copy a list of Trello cards that each have a checklist describing the setup for each modules a customer needs).

The problems our team faces

I wanted to create something that would fix our problems. The problems with the setup system are:

  • No checks for values and settings that should be turned on or off for all clients
  • No checks for settings that are based on another setting being turned on or off
  • No validation to let our team know how well we are doing setting up the customers
  • No mechanisms to run the checks and display the results to the team
  • No mechanisms for an individual to run the checks and see how well they did for a setup
  • Find our issues early in the setup process so they don't cost our team time and the company money
  • Provide a great experience for the customers being setup by setting them up correctly the first time
  • All customers have their settings checked to make sure a setting didn't accidentally get changed
  • Easily add new checks that provide value

I wanted a solution that solves our problems. Rewriting the entire setup system at this time is just not feasible because of all the modules and components that are in the system. To this end, I wrote a program called System Monitor.

What is System Monitor?

The idea behind the software is very simple. I want to run all the checks on all the customers to make sure that any known errors are found so that our team can take care of the error using a process.

What do I mean by take care of error using a process? This can be one of several things.

  • Did we miss a step in the Trello card that deals with this module?
  • Did the person setting up that module for the client not read the Trello card?
  • Is this a newly discovered issue that is affecting a customer that was already setup?
  • Did this issue come about because some other settings were changed that made this error happen?

The team is doing a great job finding errors and then documenting them in the Trello cards. In fact, checking the Trello cards and making sure that the correct settings and steps have been documented is one of the first things we check. It is also one of the best ways we have found to make sure each install goes in correctly.

The second item on the list is a much more difficult one to fix. If we have a person on the team that isn't using the documentation and checklists on the Trello cards, then we have a bigger issue. In this case I would ask the person what could be improved so that they will use the card. That will usually get us the buy in we need for them to use the cards.

The third item will creep up more and more as our team writes other checks. This is actually a good sign because we are finding errors and fixing them for all clients no matter when they got setup.

Finally, the last item is rare in our setup system. It has happened when different module engines are changed. Luckily, System Monitor has been able to find these issues and we are able to correct them.

What does System Monitor do at a high level?

The architecture of System Monitor is very simple. It has two parts. It has a checks component that is basically just a list of all the checks that it knows about. The other component notifies either an individual user or the whole team about the errors that were found.

Sample check file

A check file is stored in a checks folder that is at the same level as the main.js file. A check file looks something like this:

var sql = require('../our-sql-module.js');

module.exports = {  
    check: function(options) {
        var query = "SELECT name FROM client WHERE logging_path = ''";
        var callback = function(results) {
            if (results.error) {
                console.log('error: ' + results.error);
            } else {
                if (results.rows.length > 0) {
                    // Do something with data here and create outputString

                    options.notify.notify(outputString);
                }
            }
        }
        sql.query(query, callback);
    }
};

So the example check is very simple. It is making sure the logging path is set for each client. If the logging path is not set it will let our team know the name of the client that is missing the logging path.

Sample notification file

Next I will show you the slack notification code we use to let our team know when a check has failed. This file is stored in the notifications folder which is at the same level as the main.js file.

var slackNode = require('slack-node');

module.exports = {  
    notify: function(text, options) {
        if (!text) {
            return;
        }
        if (!options) {
            options = {}
        }
        var webHookUrl = options.webHookUrl || "https://hooks.slack.com/services/your-web-hook-url"
        slack = new slackNode();
        slack.setWebhook(webHookUrl);
        slack.webhook({
            channel: options.channel || "#monitoring",
            username: options.username || "System Monitoring",
            text: text
        }, function(error, response) {
            console.log(error);
        });
    }
};

The only required parameter for this function is some text and it will send it to the monitoring channel on Slack. If you want to change the channel, web hook url or username of the bot sending the message, you just pass that information into the options parameter.

If you are interested, the notification file that doesn't notify the whole team looks like this except it just has console.log(text); after the check to make sure text has been provided.

Bringing it all together

Finally, the main.js file brings the checks and notification modules together. It grabs the appropriate notification system and then grabs all the checks that are going to be ran. Then it will loop over each check and pass in the notification system to be used.

Here is the main.js file:

var fs = require('fs');  
var slackNotify = require(__dirname + '/notifications/slack');  
var consoleNotify = require(__dirname + '/notifications/console');  
var notifyType = consoleNotify;  
if (process.argv.length > 2) {  
  var environment = process.argv[2].trim().toLowerCase();
  if (environment === 'slack') {
    notifyType = slackNotify;
  }
}

allChecks = {};  
for (var fileNameWithExtension of fs.readdirSync('checks')) {  
    var lastIndex = fileNameWithExtension.length - 3;
    var fileName = fileNameWithExtension.substring(0, lastIndex);
    allChecks[fileName] = require('./checks/' + fileNameWithExtension);
}

for (key in allChecks) {  
    allChecks[key].check({notify: notifyType});
}

The conclusion

There is a drawback to this system. As we gain a lot of clients and checks this solution could be very slow. At that point, I think System Monitor will change so that it can be ran for a single client or even a subset of checks. I am not nervous about this concern at this time.

This system so far has fixed all the problems we have been having. Our team is getting very fast and accurate at setting up a customer in the setup system. We are able to detect errors and improve the process quickly.

It also has an added benefit, we will be able to run these checks against the new setup system that we will one day create. That should speed up the development of the new setup system and make sure we are creating something that has value from day one.

I can't wait to see where this system goes from here.

Happy coding!