A major part of the life of an engineer is about troubleshooting: a network is down or an application isn’t working, and we have figure out what is wrong. For those who have to deal with this, here are the basics of troubleshooting again – it looks like people keep forgotting them along the way.

Recently a costumer requested us to help see if we could make it easier to download security updates to the business networks on ships. When we thought the project was finished and tested, the client unfortunatelty reported that it did not work. Looking in to the problem, it turned out there were problems with downloads for a while already, even before we started the project. The buildout of a new system could not fix the existing problems. They had to be handled first.

This is good example of how to troubleshoot a problem. The first thing you need to check is whether or not a system was working well before you make a change or update to it. And document the way it was working. It sounds very logical, but in a lot of cases we see, it’s not done this way. Only if you know what the situation was before the change, you can examine the errors that came after the update and find out how to fix them.

In addition, if you want to change something on your network or update an application, do it one step at a time! Some costumers whose server racks we are migrating from the former Telecity 1 to the new Amsterdam Data Tower, ask if we can do a software update during the move or change an IP-address on the equipment at the same time. “Because we are down anyway.”

Changing multiple settings in one go will make it very hard to troubleshoot if something isn’t working. It will be unclear what causes the error. For instance, in the case of migrating the servers: is the problem about something to do with the move itself (maybe hardware has broken?), or is there an issue with the update or IP address change? The more variables you have to check, the more time it needs and therefore the more downtime can be expected.

Troubleshooting is a particular skill, but the principles are very logical. Make sure you only make one change at a time, so you can check easier for problems, because you only have to focus on one thing. Should there be a problem, make sure you know what the situation was before the change. We don’t want troubleshooting to become a lost art, would we?