forge

Using Infrastructure to Hide All Sins

There’s a secret that the infrastructure guys don’t want to share with the developers. The secret is that for almost any sin you can commit as a developer there’s a way to resolve it as an infrastructure guy — given money. They don’t want you to know that though — because they actually like getting sleep and don’t like alert emails from the system at two in the morning. I’m hoping that I don’t lose my infrastructure person card (IPC) for telling everyone this but I had to in order to make a point.

Recently I had a customer that had a three server SharePoint farm have two servers stop functioning correctly. (I think it’s really a switch issue but I’m not close enough to know for sure.) As a result of this they took two of the three servers out of the load balancing cluster and … the remaining server was able to keep up quite happily with the load being applied. The infrastructure manager I was working with was surprised. My response was to explain that the architecture in addition to being poised for growth was also designed around the idea that the developer code would be very bad — and in this case it wasn’t.

You see, I architected the system to account for a fairly high probability that the developer code would randomly and inexplicitly cause a server to crash, run out of memory, blue screen, or just generally go dark from time-to-time. With that in mind, we put two servers in that should be able to cope with the load from everyone. The third server in the farm was just there to be the token server that was in the process of crashing and coming back.

Load balancing can hide almost any server stability sin that you can come up with. Simple Network Load Balancing (NLB) included in Microsoft Windows Server operating systems can hide problems. Tools like F5‘s BigIP can hide them better.

Help Your SharePoint User

In fact, infrastructure is built on the idea that some packets are going to get dropped, lost, or delayed, and thus the TCP (Transmission Control Protocol) of TCP/IP manages the resending, reorganizing, and reassembly of packets across the network into a stream. It’s a core part of what infrastructure does — it recovers from failures — it hides all sins.

I’m not suggesting that this is always a good thing — adding more memory to a server — or switching it to 64 bit just hides memory leaks. Eventually the cost of hiding the problem will be worse than owning up to it, finding it, and resolving it.

While load balancing is the first in my bag of tricks, it’s by far not the only trick. Things like Link Aggregation Control Protocol (LACP) can be used to address a choke point to a server — the network interface. On the server side it’s called network adapter teaming while it’s LACP on the switch side. In either case the effect is aggregating two network cards into a faster virtual connection. Suddenly you double the amount of traffic you can get to and from your server.

Simple solutions to problems with bad query designs generally involve more RAM in a SQL server and smaller, faster, better organized disks. SQL server is REALLY good at caching data. If you put RAM in a box it will reduce the load on the IO subsystem. If you put smaller, faster disks in you’ll be able to get more IO operations to and from the disk — no matter how bad the query, if you throw enough hardware at it you can make it work.

For instance, I’ve made SQL server be faster at serving queries that originally were serviced by a Pervasive SQL system that was used to making ISAM file calls. I’m not talking faster by a few percent — I’m talking more than 30% faster. Sure I bought faster disks, formatted them RAID10, threw a ton of memory in the box, and ran Enterprise Edition — but we did get the goal accomplished.

That’s just in the standard bag of tricks. There are third parties in the act like the tools from Strangeloop networks which hide problems with view state in ASP.NET applications. WAN acceleration devices from Riverbed, Cisco, and Certeon can all reduce the amount of bandwidth your application uses over the WAN.

Why am I telling you this? Well, if you’re an infrastructure person just pretend it’s so you can buy cool new toys. If you’re a developer, I want you to know that if you make a mistake or overlook something that your infrastructure colleagues have your back. Of course, some of the solutions don’t come cheap so you may have to forgo your next bonus or your next raise.

So I’m pleading with you, do your development right so that your infrastructure brethren don’t have to hide your sins. Appropriately optimize your code. Make sure everyone on the development team understands the fundamentals of good design. Do peer software reviews. Do whatever it takes to make great code.

3 replies
  1. Jeremy Thake
    Jeremy Thake says:

    I’ve seen sites where they have 16xCPU, 32 Gb RAM SQL servers to hide ineffiecient SQL Stored Procs or better still throwing 8GB RAM at SharePoint WFE servers to get around memory leaks in code in a Web Application 😉

Comments are closed.