We need to be online all the time. To ensure our continued online presence, a lot of the services on the Internet are built redundantly. If there is a problem, traffic to that service will be rerouted and the service will still be online. A good thing, but… just don’t overdo it.
From the fibre cables in the streets to the backbone routers in the server rooms and everything in between, some feel you need to make every layer redundant. Sure, everything is ‘backed up’ by another system. But if you make all layers redundant, then the combined system becomes so complex, that it will collapse onto itself.
Looking at the network, Internet service basically has three layers of technology to make it function properly. A fibre cable (layer 1), the data layer (layer 2, usually Ethernet in Europe) that runs over this fibre, and IP subnets (layer 3) that are routed over the Ethernet that run over the fibre cable. Some feel that all three layers need to be redundant to ensure the services will be always online.
First off, it is extremely complex to make a fibre optic cable redundant. You need a near exact copy of the cable, make sure it is about the same length, that the light receivers are configured the same way, all while connected to the original cable, but not in the same location as the original one. Why not just make sure there is a good service road available, should things go wrong?
Most people will have horrible experience making redundancy on the Ethernet. In the Internet Service Provider world, a redundant layer 2 service is therefore usually considered impossible and an absolute no-go. Creating a big, redundant layer 2 domain is a recipe for downtime, which is the exact one thing you want to prevent with the redundancy.
So in all, our advice is to have your redundancy on the third layer, where all traffic is routed to IP addresses, and that should be enough. Routers are especially designed for this, there will be redundancy and no loops. But how about the redundancy of the router itself?
In ISP hardware it is always possible to have redundant control engines installed. But why have a two engine router where both routing engines are constantly talking to each other, asking which one is master, and synchronizing data – and therefore with a much higher more chance of making mistakes and becoming less reliable – while you can have a two single engine routers that will be much more reliable in the end, if you look at the system as a whole?
The more complex the system is with more redundancy, the higher the chance of problems. Our philosophy is to have as little redundancy as possible but have it on the correct layers only. Making the system over-redundant will definitely lead to longer downtimes. And that’s something nobody wants, right?