Skip to content

Agile Development through Lunch

As I’ve mentioned in previous posts, I’ve been doing a ton of personal reengineering, trying to get caught up on my personal and leadership practices for software development practices.

One of the things that struck me through my reading is that there are Agile techniques that I’ve been using for a while.  One of those is trying to gather the team for lunch.  Why?  Well, we all have to eat… and it gives us a chance to have more time to interact.  We rarely spend the whole lunch talking about the project, we talk about jumping out of planes, how crazy I think it is to jump out of a WORKING plane, gadgets, etc.  However, we often slip into project conversations for a few minutes to ask a random question, or clarify something.  It’s part of the randomness of the human condition.

I never gave it much thought, it was just something I did to make sure that the team was staying connected.  It turns out that one of the foundations for agile development is improved, frequent communication.  Alistair Cockburn recommends being close to each other in the office.  While that wasn’t always possible it was generally possible to gather the team up to go to lunch together.

… hmmm, Come to think about it I frequently used IM to “gather the troops”… I wonder if that’s a supporting technology for agile development ;)

Balancing Agility and Discipline: A Guide for the Perplexed

Book Review-Balancing Agility and Discipline: A Guide for the Perplexed

I mentioned briefly in my last post that I was reading Balancing Agility and Discipline: A Guide for the Perplexed.  I managed to navigate my way through the end of the book and wanted to record my initial thoughts about the book and about my growing perspective on Agile software development in general.

On the book…

The fundamental question that the book answers is how do you take the best of Agile development and integrate it into your current software development methodologies.  It neatly disguises this question under the auspicies of exploring the places where each method “belongs.” — their home ground.  It does what appears to be a very thorough job of reviewing the strengths and weaknesses of both traditional (plan-based) development and agile development.

My current thinking about Agile development (and software development in general) is this…

1) Agile software development makes the same statement that traditional development does — better programmers make better projects.  I was struck with how much emphasis good software development –whether traditional or agile — places on good developers.  (It would be more accurate to say good participants.)

2) Agile software development is like Voodoo — at least in the movie Dogma’s definition “Do you know about voodoo? No constitution of faith, more an arrangement of superstitions.”  In other words, it’s a collection of techniques — some shared between different agile variants, some shared with more traditional plan-based development, and some shared with project management practices.  The percentage of people following a specific technique is very low.

3) Agile is akin to “Management by Wondering Around” one of the practices made popular by In Search of Excellence.  It focuses on people actually talking to each other.  Given the number of large scale projects I’ve worked on where that doesn’t happen — I’m all for people talking to each other. ;)  In a more serious way, the focus is on individual and personalized conversations with people.  Answering their specific needs and coaching them into effective behaviors — one by one.

4) The arguments that are being used by both sides of the isle represent a fundamental lack of understanding.  It reminded me of a recent set of posts on the Dilbert Blog about Intelligent Design vs. Evolution.  [Warning: The preceding link may consume a huge amount of time as you try to figure out which of the arguments are the dumbest.]

5) We’re all missing the point … We start good development by getting good people involved.  We get good people involved by DEVELOPING / CREATING / BUILDING good people.  If we’re not all willing to go out of our way to help other people become other developers — every day whether it makes sense to the particular project or not — then we may be doomed to struggle with poor software developers forever.

The good news is, however, I’m more agile than I thought.

Load Balancing and Clusters

I’m in the middle of reading Balancing Agility and Discipline and I keep getting distracted by this thought… hopefully I can get it out of my head…

I was speaking yesterday with a hosting company that hosts web sites for organizations. They have a niche where they serve largely mid-market organizations.  We were discussing their movement into load balanced web servers and database clusters.  One of the comments the CTO made was that they had never had a need to move to a load balanced environment to handle the traffic on their web sites.  They do have one very large web site so his statement makes sense based on the fact they’re seeing solid volume.

However, what struck me is that volume and capacity rarely drive the decision to move to load balanced front end web servers and a clustered database.  Nearly universally this decision is made for reliability.

This was made clear to me by one of my enterprise clients. ( and I do mean enterprise, their consumer brands are something I can virtually guarentee you have in your home if you’re in the US.)  We were talking about the two server load balanced web servers we put together and he mentioned that they’re running like champs.  They host nearly 50 sites.  I don’t know the number of total hits per day but it’s not a trivial number.

So why are they on a load balanced web farm?  Uptime.  Most of it is reliability with a small percentage of the decision coming from the ability to deploy new versions of the web site and roll back if necessary.  Sure someone likes to think that it’s performance but it isn’t.

Today’s hardware can handle a lot of web hits.  Particularly with a well written application — so if you’re thinking about whether you need load balancing — and back end clustering of your database — evaluate the decision from the perspective of reliability — not performance.

The sound of a million crickets longing to be free

If you’ve been wondering why my blog has been so quiet over the last 45 days or so, the answer is a bit complex. (but you can continue listening to the crickets chirping in the absence of content here.)  I need to take a break from my reading to give my brain a rest… so now’s as good a time as any to explain the pieces of the puzzle that have lead to the relatively low number of posts.
First, and foremost, I’ve entered into a “phase” of reinvestment.  That means many things but the short answer is I’m learning new things and relearning old things that I’ve forgotten – or I’m at risk of forgetting.
One of the pieces of this puzzle is the relatively large amount of software that Microsoft’s dropped at my door.  I hadn’t really had much of a chance to work with .NET 2.0 or SQL Server 2005 while they were in beta – a side effect of a busy life.  So I’m in the process of getting up to speed on them.
Just in case you’re wondering – for me getting up to speed means doing things that are a bit beyond the envelope.  I’ve been playing with configuration classes (to support non-file based configuration sources), factory classes, and generics.  I’ve not gotten done yet, however, this will be the foundation of the new web site that I’m building.  (Don’t ask when it will be done, I don’t know yet.)  I’ve also been playing around with master pages and web parts … I’m trying to mesh those into a foundation layer that allows me to control what’s on the page from a database.  … and don’t even get me started about the HTTP handler to allow Google to walk around the foundation without getting aggravated by query strings.
Couple that with “Office 12” and Commerce Server Betas.  Office 12 contains SharePoint which is one of the things driving a great deal of my work these days.  Commerce Server is near and dear to my heart for many reasons not the least of which are that it’s at the core of the new web site I’m putting together and I’m a Commerce Server MVP.
Because I’m preparing to embark on a relatively large software development project, I’m reminding myself how to do things right.  That means rereading Steve McConnell’s Rapid Development book, Fred Brook’s The Mythical Man-Month, and a few other Software Development Practice books.  I’ve read them all before, but I felt like having read them about 10 years ago made it time to reread them.  (I highly recommend both of the above books – both are “dated” but they illuminate some core concepts and problems that don’t seem to change.)
I’m also reading Karl Wiegers’ Software Requirements, Second Edition.  I’ve had a few projects recently that somehow managed to skip the requirements phase of development (in whatever lifecycle model you want to call the projects.)  So I thought reading it would remind me of what a project with requirements is like.  The book is solid, though somewhat repetitive.  It’s much like a buffet where you take what you want – though I miss the carefully coordinated entrée and sides.  My personal task for the end of the reading process is to try to distill the information into a one page cheat sheet I can laminate to remind me of the core concepts.  I find that requirements is the one area where continuous reminders is helpful for me.
When I get done there I get a chance to read some books on Agile development … I’m looking forward to getting a firmer foundation for what Agile is supposed to be – presuming that the two books on the topic that I have can do that.
The second factor for the lack of posts is that I’m doing some infrastructure cleanup tasks.  I run my own network infrastructure – I suppose that my MCSE and my former MVP aware for Windows Server-Networking will die hard.  There have been nagging little problems with the infrastructure for a while – errors that weren’t causing any real problems but were still getting thrown to the logs.  Well, in an effort to get my Pocket Pc Phone edition working with direct server synchronization, I had to clean up more than a few of those errors.  (By the way, My QTek 9100 Windows Mobile 5 [Pocket PC Phone Edition] rocks.  The over-the-air sync to Exchange is amazing.)
The third factor is that I’m building an arcade style kiosk for my son.  It will house his PC.  It will be like the old “Pac-Man” arcade games but slightly wider to accommodate the keyboard, mouse, and joystick.  It will have a real coin door which will eventually allow you to turn on and off the PC as well as an overhead light.  The coin door itself has been installed, I just haven’t finished the electrical for the switches.  Because my son is only 4 (as of January 5th) I’ve also built a seat arrangement that slides in and adjusts up and down so we can get the seat setup at a reasonable height for him.
The net effect of this project is that it’s taking a ton of my time.  I’ve begun referring to it as the albatross around my neck.  However, I saw the light at the end of the tunnel today as all of the major components have been assembled – I just need to get it in the house and hooked up and I’m going to take a break from the project for a while.
The fourth factor is that my writing schedule has been slowed WAY down.  Basically, I’ve temporarily discontinued my regular writing.  I expect that I’ll start writing about software development issues again very soon – however, in the mean time that means that even the regular posts linking to my articles have been missing.
The fifth factor is that I’ve been working a ton of hours at the church.  I took on leadership of the technical ministry for worship services which means I’m responsible for coordinating sound, lighting, and media.  Of course, I’m trying to get everything documented and get a better fundamental understanding of each part so that I can feel effective…  This in the middle of the Christmas production and my other duties taking care of the IT needs have been overwhelming.
So that’s it… add in the usual family commitments for Christmas and New Years, a few days of illness, and you end up with some pretty sparse blog postings.  However, I am seeing the light at the end of the tunnel on most of these things, so hopefully I’ll have more to blog about real soon.

Blog Spammers

Well, blog spammers have finally found my blog.  The net result of which is that I’ve puposefully broken comments for the time being.  This will just accelerate my move to Community Server…

Article: Top 4 Things Project Managers Do To Destroy Software Quality

Project managers often use techniques that are successful at reducing cost and the development time without impacting quality. However, it is possible for them to push their techniques too far. Here is how.

Do I have your attention yet? Most people in software development instinctively know that the project manager’s drive to make sure the project is on time is at odds with the desire to have high quality software. Not that project managers don’t want high quality software too, it’s just they want the software and they want on-time delivery and costs that are at or less than what was estimated, in addition to quality. Their efforts are often successful at reducing cost and the development time without impacting quality. However, it is possible for them to push their techniques too far.

Although all of the following project management techniques are at least well meaning, and in some cases, they are even time honored techniques, they do have the potential for disaster.

Time boxing

Getting top honors in the list of things which can destroy software quality is the practice of time boxing. This practice is where you tell someone how long they are allowed to work on the task before it must be turned over. I say turned over and not completed because used at its extreme it often means that the code isn’t complete, it’s merely pushed along the process.

http://www.techrepublic.com/article/the-top-4-things-project-managers-do-to-reduce-software-quality/

How to Backup Windows SharePoint Services

Backing up SharePoint Services isn’t as easy as making a copy of a file on the file system – but it’s not as difficult as trying to backup many of the organization’s legacy systems.  The key to backing up SharePoint is to control what you’re backing up based on what information is changing.  Backing up SharePoint can be as simple as usingthe STSADM tool, or depending upon your needs perhaps it’s the SMIGRATE tool.  In either case you need to understand what precisely you want to backup and how you may want to do your restore.

Backup Options

There are essentially four ways to backup SharePoint Services: STSADM, SMIGRATE, SQL Database, and 3rd Party tools.  We’ll examine each on the merits of strengths and weaknesses before reviewing the kinds of restores that most organizations face.

STSADM

The STSADM tool, which is a part of Windows SharePoint Services, is a Swiss army knife which his used to add web parts, change properties, and a variety of other actions including performing a backup of a site and all of it’s sub-sites.  The STSADM tool is designed to backup one site with complete fidelity – that is, back up absolutely everything about a site.  It does this task quite well.  However, there are a few challenges.

First, most organizations find that they will be confronted with the need to do single-file level restores of files stored in SharePoint.  This is because users will invariably accidentally delete a file that they didn’t intend to.  As a result a file or a small group of files will need to be restored back to SharePoint.  This is where STSADM’s story is less than stellar.  In order to restore a single file a sequence of steps must be followed.

The first step in restoring a single file is to create a new virtual web server has to be created and extended for SharePoint Services.  Then the STSADM tool is used to restore the existing site backup.  From there the files are copied from the new instance of SharePoint services and finally put back in the site.  Through this process the version history for the file is lost – unless each file is extracted and placed into services one at a time.  This is in general a fairly onerous task to simple restore a single file.

Second, the actual date/time stamps of the versions and the person that checked them in will be lost.  This can be an issue if version histories are a part of your information management audit trail.

Despite these two limitations for single-file restoration STSADM is the tool of choice for backing up SharePoint Services sites when it’s important that absolutely every aspect of the site is backed up and can be restored without question – it’s the gold standard for backing up Windows SharePoint Services.

SMIGRATE

SMIGRATE is another Windows SharePoint Services provided tool which can be used to backup a WSS site.  However, it takes a radically different approach than STSADM.  First, it doesn’t do a full fidelity backup – there are certain items which it doesn’t process.  Neither permissions nor personalization is backed up in an SMIGRATE file.  However, the fact that these are not backed up is countered by the simplicity which a single file can be restored.

The FWP file that the SMIGRATE tool creates is a renamed CAB file which contains a manifest.xml file and each of the individual files in the single site specified.  This includes all of the pages which represent the site.  So any file within the site, including the files that make up the site itself can be easily extracted and restored quickly.

The process for restoring a single file with SMIGRATE is as simple as opening the manifest.xml file, searching for the file to restore, extracting it, renaming it, and uploading it back to the portal.  The time to restore is greatly reduced as is the complexity in completing the restoration.

One of the other differences between STSADM and SMIGRATE is that SMIGRATE only backs up one site at a time.  It doesn’t backup sub-sites so it requires a potentially large number of individual backup commands to complete the backup of a large set of sites.

Despite the obvious limitations of not providing a complete backup, the SMIGRATE backup solution can be very valuable for organizations that need to be able to restore a single file with relative ease.

SQL Database Backup

Both of the two options above are new and unique to SharePoint.  They are not the same tried, tested, and true solutions that administrators have been used to for a long time.  For most organizations backing up SQL server databases is a core skill that has been developed over the last several years to the point of rote perfection.  Backing up SQL server databases is supported by existing processes and techniques and because of that backing up the SQL server database directly is often a good strategy for an organization seeking to backup SharePoint services.

Most everything that SharePoint does is stored in the associated content database.  Because of this most of the things that are necessary for reconstructing a site from a disaster will be found in the database.  In fact a backup of the system plus a backup of the SQL server will restore the site.  In most cases, however, it’s easier to use STSADM to make a baseline backup of the site and then use SQL server to restore from catastrophic events.

The challenge with SQL backups is essentially the same as STSADM.  It’s an all or nothing proposition if you loose something in a site.  The one further disadvantage SQL has over STSADM is that STSADM can backup sites individually.  SQL backup must backup all sites at the same time.

When backing up SQL server databases for SharePoint there is no special care that needs to be taken, beyond getting an original backup of the system which includes the additional files necessary for a complete site.

3rd Party Tools

The preceding list of backup solutions are available for free to every organization using SharePoint.  However, these options may–or may not – fit into the organizations existing backup strategy.  In addition each of the preceding backup solutions has its own set of limitations.  If it can do single-file restores it doesn’t do a complete backup.  Conversely the complete backup method via STSADM makes restoring a single file difficult.

There are at least three third parties who have SharePoint backup agents:

  • AvePoint
  • Veritas NetBackup
  • CommVault

These solutions are commercial solutions which compete on their ability to deliver a complete backup solution.

How to Approach Performance Monitoring in Windows

Performance monitoring and tuning are topics which most professionals know or care little about – until performance becomes a problem.  It’s one of the topics that doesn’t come up frequently enough to drive a lot of interest in understanding how it works, how it fits together, and what to do about it. However, there is value in understanding the fundamentals of performance monitoring with Windows-based system and what to do based on what you find.

Searching Out the Bottlenecks

The primary activity in performance monitoring is seeking to understand the bottlenecks in the system which either is already causing performance issues or have the potential to cause performance issues in the future.  Because we’re seeking out bottlenecks we’re looking – primarily – for metrics and counters which are able to tell us the relative amount of capacity that has been used and how much is remaining.

Because of this the performance metrics that we’re going to gravitate to are those which are expressed as a percentage.  The percent of disk time, percent of CPU time, and percent of network usage are good examples of the kinds of metrics that we’ll want to focus on when evaluating performance at a macro level.  They are not, however, an exhaustive list of metrics.  They are only the metrics that are easiest to understand and extract value from quickly.

Spikes and Sustained

Even with counters that report status on a percentage of available resources there are still challenges to face.  The first challenge is determining when there’s a problem because of a sustained lack of available resources and when it’s a momentary blip on the radar.

The primary consideration in performance monitoring is over what interval of time can you accept performance challenges?  What level of performance is acceptable and which is not?  Is it important that the CPU have some availability every second?  In most cases the answer to that question is no.  However, the question becomes more difficult as you ask the question over a one minute interval.  Most users tolerate the occasional slow down that is over within a minute.  However, hours of performance problems are a different story.

So when evaluating what is a performance problem and what isn’t a performance problem consider how long your users would be willing to accept a slow down and then ignore or temper your response to momentary spikes in whatever counter you’re looking at.  Momentary spikes are a normal occurrence and simply mean that the system is pouring all of its resources into fulfilling the requests that the users have made.

Objects, Counters, and Instances

Performance monitoring on a Windows system requires an understanding of the way that Windows breaks down counters.  On a Windows system performance monitoring starts with an object.  An object is a broad representation of something, such as memory.  This broad topic groups a set of related counters. Each counter is an individual measure in that category.  For the memory object, page faults/sec, pages/sec, and committed bytes are all examples of counters.  Each counter may measure the object in a different way but all of them relate to the object to which the counter belongs.

For each counter there may be multiple instances.  An instance is a copy of the counter for a different part of the system.  For instance, if a system has two processors, the counter for % processor time will have three instances; one for each processor and one for a total (or average) between the two processors.  In most cases each instance needs to be viewed separately from the others to identify specific areas where problems may occur.

Key Subsystems

You’ll find that for most purposes there are only four areas of performance monitoring that you care about.  They are: memory, disk, processor, and network.  These are the key metrics because they are the core system components that are most likely to be the source of the bottlenecks.

One of the challenges in performance monitoring is the interdependence of these key subsystems on one another.  A bottleneck in one area can quickly become a bottleneck in another area.  Thus the order which you evaluate the performance of these subsystems is important to reaching the right conclusion.

Memory

The first characteristic to evaluate is the memory characteristic because it has the greatest potential to impact the other metrics.  Memory will, in fact, often show up as a disk performance problem.  Sometimes this disk problem will often become apparent before the memory issue is fully understood.

In today’s operating systems when memory is exhausted the hard disk is used as a substitute.  This is a great idea since hard drives are substantially larger than memory on a server.  However, it has the disadvantage that hard drives are orders of magnitude slower than memory.  As a result what might be a relatively light load on memory will quickly tax a hard disk and bring both the disk and the system to its knees.

One way to mitigate this is to minimize, or eliminate the virtual memory settings in Windows to prevent Windows from using the hard drive as if it were memory.  This setting can prevent a memory bottleneck from impacting the hard drives – but raises the potential for the programs running on the server to not be able to get the memory that they need.  This is generally an acceptable balance for making sure that you’re aware of the true root cause of an issue.

The memory counter to watch is the pages per second (pages/sec) counter.  This counter tracks the number of times that data was requested from memory but it had to actually be read from disk.  This counter, above all others, helps to identify when the available memory doesn’t meet the demands of the system.  A small number, say less than 100, of these is a normal consequence of a system which is running, however, sustained numbers larger than 100 may indicate a need to add more memory.  If you’re seeing a situation where you need more memory you can not evaluate the disk performance reliably since the system will be using the disk to accommodate the shortage of memory.

Disk

The primary counter for monitoring disk time is the ‘% Disk Time’ counter.  This counter represents the average number of pending disk requests to a disk for the interval multiplied by 100 (to get a percentage.)  This calculation method leads to some confusion when the disk driver can accept multiple concurrent requests such as SCSI and Fibre Channel disks.  It is possible for the instances measuring these types of disks to have a % disk time above 100%.

One of the choices to be made when selecting disk counters is whether to select Logical disk counters or Physical disk counters.  Logical disk counters measure the performance relative to the partition or logical volume rather than by the physical disks involved.  In other words, Logical disk counters are based on drive letter mappings rather than on the disks involved.  The physical disk option shows instances for each of the hard drives that the operating system sees.  These may either be physical drives or in the case of RAID controllers and SAN devices, the logical representation of the physical drive.

In general, the way that you’ll be measuring performance for disk drives the best approach is to use physical disk counters.  This will allow you to see which hard disk drives are busier and which ones are not.  Of course, if there’s a one-to-one relationship between your logical drives (partitions) and the physical drives (that the operating system sees) then either logical or physical disk counters are fine.  However, only the physical disk counters are turned on by default.  If you decide to use logical disk counters, you’ll need to run the DISKPERF command to enable logical disk counters, and reboot the system.

The % disk usage counter should be evaluated from the perspective of how long of a performance slow down you can tolerate.  In most cases, disk performance is the easiest to fix – by adding additional drives.  So it’s an easy target if you’re seeing sustained % disk times above 100%.  If you’re on a RAID array or a SAN consider that you may want to be evaluating the % disk times from 100 % times the number of effective drives in the array.  For RAID 1 and RAID 1+0, it’s one half the number of disks.  For RAID 5, it’s the number of disks minus one.

Processor

Since the dawn of computing, people have been watching processing time and the processes which are consuming it.  We’ve all seen the performance graphs that are created by task manager and watched in amazement at the jagged mountain range that it creates.  Despite the emphasis on processor time for overall performance it’s one of the last indicators to review for performance bottlenecks.  This is because it’s rarely the core problem facing computers today.  In some scientific applications and others with intense processing requirements it may truly be the bottleneck – however, everyone seems to know what applications those are.  For most applications processor speed just isn’t the key issue.

The most useful measure of a processor’s availability is the % processor time.  This will indicate the percentage of time that the processor (or processors) were consumed.  This is useful because taken over a period of time it indicates the average amount of capacity that is left.

Improving processing speed isn’t an option for most servers.  The application will need to be split up, optimized, or a new server installed to replace the existing one.  It is for this reason that when processing bottlenecks occur they are some of the most expensive to address.

Network

Until recently not much thought was given to the network as a potential bottleneck but with the advent of super-sized servers with four or more processors and terabytes of disk space it has to be a consideration.  Network performance monitoring is a measure of how much of the bandwidth available on the networks is actually being consumed.

This is a tricky proposition since the connected network speed may not be the total effective speed.  For instance, a super-server is connected through a 1GB connection to a switch which has eight 100 MB connections.  The server will assume that 1GB of data can flow through the network that it is connected to.  However, in reality only 800 MB at the most is truly available to be consumed.

Another consideration is that many network drivers even today are less than stellar in their reporting performance information.  More than a few network card drivers have failed to properly report what they’re doing.

In general, network performance monitoring should be done from the perspective of understanding whether it is a possible bottleneck by evaluating what the maximum effective throughput of the network is likely to be and determining what that percentage of the theoretical limit is.  In general it is reasonable to assume a 60% utilization rate for Ethernet is all that is really possible.

Resolving the Details

The guidelines here may not be enough to completely diagnose a performance problem and identify a specific course of action to resolve it, however, in many cases it will be.  In those cases where it’s not clear enough to be resolved by looking at the high level indicators that were mentioned here, you’ll have to dive through the other counters and identify which ones will help you isolate the problem and illuminate potential solutions.