forge

The Monster Server Project – VMWare, iSCSI, and learning

So I’ve been working on a project for quite some time, and I’m finally seeing the light at the end of the tunnel.  Doing work with SharePoint means getting to know virtual machines and, relatively speaking, throwing a lot of hardware at the problem.  I like many other SharePoint consultants have ended up with a high end laptop.  However, that’s scored me the reputation as the guy with the heaviest bag.  Not only do I have to carry a big laptop but I’m also carrying two or three external hard drives at any one time.  That may make my chiropractor happy but it’s not something that I’m real fond of.

So my idea is that I’d get a big virtual server up and running and I’d just connect to the images running on that server – instead of trying to run them locally.  The idea sounds good, particularly when I’m in my office with reliable high speed connectivity.  However, there’s some question to how this will work at clients and on the road.  I’ll let everyone know how that goes when I get there.  It is, however, the impetus for this project.

I had purchased a server to do a conversion project.  The short story is I needed a server with fast disks and a fast processor to handle some conversion work.  Yes, the project was big enough to just bundle in the cost of the hardware.  I ended up with a Dell 1950 1U server with a Quad Core 2.66 Ghz and 4GB of RAM.  I also configured it with 4x 146GB SAS 10K drives.  I didn’t need that much space and I was concerned about speed of the drive arms.  While I was at it I added a Dell Remote Access card so I could get to the server even if something bad happened.  The server is better than my core network infrastructure by a wide margin – it’s the first truly server class machine I have.

The conversion project came to an end and I was left with a server that by all accounts could make a good virtual host.  However, there wasn’t enough RAM to run multiple machines (to do things like test farm configurations for SharePoint).  So I went to buy RAM for the machine.  I thought it would hold 16GB of RAM and was prepared to make that purchase.  However, I realized upon further review that it would take 32GB of RAM.  So I maxed it out.  My wife will tell you this is a standard response from me.  I rarely go halfway.(I think if she understood she’d be confused by the fact that it only has one quad core processor in it.)

I had configured the disks on the machine in a RAID 1+0 for performance … I wanted to keep that because in addition to my development machines I wanted to be able to run my web site off of the machine.  When I started to look at the RAM to disk ratio I could tell it’s out of whack.  By this time I had decided to use VMWare Infrastructure (ESX) to host my virtual machines.  My friends at Bluelock had already convinced me that the performance was in some cases better than on physical hardware.  (Watch Windows boot and you still won’t believe how fast it is.)  So that meant I had to figure out storage options that would work with VMWare.

Well, VMWare doesn’t support USB drives, and their support for storage controllers is VERY limited.  Basically, I couldn’t find a direct attach solution for SATA drives that was supported.  Why SATA drives?  Well, because they’re cheap for large capacities.  Since I had fast storage, I needed some Tier 2 slow storage.  Honestly, there’s not much you can do to beat $250 for a 1TB drive.

What is supported in VMWare is iSCSI.  iSCSI is the poor man’s SAN technology.  I call it the poor man’s SAN because it’s much less expensive than a Fiber Channel solution.  The performance can be relatively close to fiber channel performance if correctly configured.  For what I needed iSCSI was going to be more than adequate.

Well, that’s fine except that iSCSI cabinets are on the somewhat expensive side – even those that take SATA drives.  After a relatively exhaustive search I found a 2U unit sold by Enhance Technologies called RS8IP.  It can hold a total of 8 SATA drives.  For my initial bite at the apple I settled on 4x 1TB Seagate drives.  The MSRP for the RS8IP is slightly over $3K and the 4 drives cost about another $1K.  Yes, I am talking about spending $4K on storage – but it’s 4TB of unprotected storage.  I opted for RAID5 which left me with a 3TB usable space. (actually 2.793 TB since 1TB drives are really slightly smaller than that when you do your division by 1024 instead of 1000).

I’ve had VMWare up and running for a while on the system, but honestly I’ve not had the time to really get to know it too well.  I do know that it’s really quick with the virtual machines I’ve been running on it.  I hadn’t fully converted to using it for my development systems for space reasons – and it hadn’t been moved to its permanent home, it’s been sitting here at the house so I couldn’t really run multiple instances and get to them from the outside.  Anyway, I was impressed by windows boot times and general performance.

Configuring the RS8IP was pretty easy, except for some mistakes I made.  There’s a quick install that can be accessed from the front of the unit.  I did that and when I got asked about LBA64 – I said Yes.  That, I would learn later would be a problem.

I started out trying to figure out how to get VMWare to talk to the unit.  I got an error message about VMotion and iSCSI having not been licensed… Interesting since the licensing screen said I was licensed for it.

A quick call to a buddy and I was told that I had to create a VMKernel port for iSCSI on my virtual switch in VMWare.  Once I created that I could create the iSCSI storage adapter.  Once I had the adapter I scanned for the iSCSI targets (luns) I exposed on the RS8IP.  No dice.

It was about this time I was really not feeling good.  It’s not too often that I wonder off into areas that I don’t have a way to test and troubleshoot but I was concerned this might be one of those times.  However, I saw instructions for the MS iSCSI initiator.  I downloaded it and installed it into a virtual machine with Windows XP on it.  The MS iSCSI initiator loaded up, it saw the RS8IP, but it didn’t show any volumes.  As a verification, I went over to a Vista machine I have and set it up on the Vista machine.  Presto… I had a 2793 GB drive showing up.

It was about this time that I started thinking about what might be different between Vista and XP… and I realized that LBA64 support was a likely candidate.  I thought that perhaps VMWare would have the same limitations so … I rebuilt the drive array without LBA support.

And … Presto after a few short hours I could see the drive in XP.  (By the way, had I paid better attention I could have started the test while the array was rebuilding but I forgot to setup the LUN and well, I didn’t remember until after the array was rebuilt.)

So it’s visible in XP and in Vista… but still not visible in VMWare.  A few emails and a little while later I had a buddy at Bluelock respond to ask me if I had enabled iSCSI through the VMWare firewall… Doh!  Once I enabled it through the VMWare firewall and rescanned.  I had a LUN show up.  Success … or so I thought.

I went to add the storage to VMWare and I was told that there wasn’t a partition table, it was corrupt, the world was coming to an end… well, you get the point.  I briefly played around with the fdisk utility from the command prompt and decided that there were enough problems VMWare was having that I should probably turn LBA 64 back on and see what happened.

And I didn’t see the drive.  So I thought, well, maybe it has to fully rebuild.  A few hours later … after the rebuild was completed, I tried again.  No dice.  By this time I’m getting pretty frustrated.  I’m feeling like I’m trying to guess the combination to a lock.

After some more research it became apparent that the LBA64 bit question comes up right at 2TB.  So once again I deleted the array and this time created two volumes and two LUNs.  One volume was 1396 GB and the other was 1397.  Neither had LBA64 turned on.  So I rescanned from VMWare and found both luns.  I went in and got VMWare to add them to the storage and even copied files to them almost immediately.  Sure the array was still building in the background, but I copied a non-trivial amount of data over pretty quickly.

Sunday morning while my family was sleeping I decided to have some fun.  So I simultaneously installed: Windows XP, Windows Vista, Windows Server 2003 R2, Windows Server 2003 R2 x64, Windows Server 2008, and Windows Server 2008 x64 on the environment.  In less than 2 hours I had six operating systems installed.  OK, that’s what I’m talking about!  I figure most of the reason it took 2 hours was I was having to remember to check on the installations and press keys.  Oh, yea, I patched all of those operating systems in the same 2 hours.  If you’ve ever patched a new installation of XP you know that can take 2 hours on its own.

I know I’ve still got a few gremlins in the system but they’re minor at this point.  First, I don’t think VMWare has the adapters teamed correctly.  I was looking at the switch diagnostics and it was showing 50% utilization on one port and almost none on the other port that the NICs were attached to.  Second, the drives are being recognized by the RS8IP as SATA 1.5Gb/s instead of SATA 3.0 Gb/s.  (It’s got a cool screen that shows me that information though.)  Third and Finally, I don’t know enough about VMWare yet to figure out how to convert my new operating systems into templates that I can use to create new systems.

So what did I learn from this exercise?

  1. It’s good to have friends.  OK, I already knew that but it’s worth repeating.  Seriously – thanks Ben and Andy for your help.
  2. Don’t create volumes that are larger than 2TB no matter how tempted you are.  It’s just going to be painful.
  3. VMWare is a powerful tool but one that requires a lot of learning.  (Oh, and finding information on it or the problems your facing is pretty difficult.)
  4. iSCSI is cool.  It performs pretty well and can be setup by mortals (notwithstanding the 2TB volume size issue.)
  5. To setup iSCSI on VMWare: a) Make sure you have a VMKernal port, b) Make sure you let VMWare make outbound iSCSI connections, c) Setup the iSCSI cabinet in the dynamic discovery tab of the iSCSI adapter, d) Rescan the iSCSI adapter (takes 60+ seconds), e) Don’t create LUNs that are greater than 2TB.