Backing up SharePoint Services isn’t as easy as making a copy of a file on the file system – but it’s not as difficult as trying to backup many of the organization’s legacy systems. The key to backing up SharePoint is to control what you’re backing up based on what information is changing. Backing up SharePoint can be as simple as usingthe STSADM tool, or depending upon your needs perhaps it’s the SMIGRATE tool. In either case you need to understand what precisely you want to backup and how you may want to do your restore.
There are essentially four ways to backup SharePoint Services: STSADM, SMIGRATE, SQL Database, and 3rd Party tools. We’ll examine each on the merits of strengths and weaknesses before reviewing the kinds of restores that most organizations face.
The STSADM tool, which is a part of Windows SharePoint Services, is a Swiss army knife which his used to add web parts, change properties, and a variety of other actions including performing a backup of a site and all of it’s sub-sites. The STSADM tool is designed to backup one site with complete fidelity – that is, back up absolutely everything about a site. It does this task quite well. However, there are a few challenges.
First, most organizations find that they will be confronted with the need to do single-file level restores of files stored in SharePoint. This is because users will invariably accidentally delete a file that they didn’t intend to. As a result a file or a small group of files will need to be restored back to SharePoint. This is where STSADM’s story is less than stellar. In order to restore a single file a sequence of steps must be followed.
The first step in restoring a single file is to create a new virtual web server has to be created and extended for SharePoint Services. Then the STSADM tool is used to restore the existing site backup. From there the files are copied from the new instance of SharePoint services and finally put back in the site. Through this process the version history for the file is lost – unless each file is extracted and placed into services one at a time. This is in general a fairly onerous task to simple restore a single file.
Second, the actual date/time stamps of the versions and the person that checked them in will be lost. This can be an issue if version histories are a part of your information management audit trail.
Despite these two limitations for single-file restoration STSADM is the tool of choice for backing up SharePoint Services sites when it’s important that absolutely every aspect of the site is backed up and can be restored without question – it’s the gold standard for backing up Windows SharePoint Services.
SMIGRATE is another Windows SharePoint Services provided tool which can be used to backup a WSS site. However, it takes a radically different approach than STSADM. First, it doesn’t do a full fidelity backup – there are certain items which it doesn’t process. Neither permissions nor personalization is backed up in an SMIGRATE file. However, the fact that these are not backed up is countered by the simplicity which a single file can be restored.
The FWP file that the SMIGRATE tool creates is a renamed CAB file which contains a manifest.xml file and each of the individual files in the single site specified. This includes all of the pages which represent the site. So any file within the site, including the files that make up the site itself can be easily extracted and restored quickly.
The process for restoring a single file with SMIGRATE is as simple as opening the manifest.xml file, searching for the file to restore, extracting it, renaming it, and uploading it back to the portal. The time to restore is greatly reduced as is the complexity in completing the restoration.
One of the other differences between STSADM and SMIGRATE is that SMIGRATE only backs up one site at a time. It doesn’t backup sub-sites so it requires a potentially large number of individual backup commands to complete the backup of a large set of sites.
Despite the obvious limitations of not providing a complete backup, the SMIGRATE backup solution can be very valuable for organizations that need to be able to restore a single file with relative ease.
SQL Database Backup
Both of the two options above are new and unique to SharePoint. They are not the same tried, tested, and true solutions that administrators have been used to for a long time. For most organizations backing up SQL server databases is a core skill that has been developed over the last several years to the point of rote perfection. Backing up SQL server databases is supported by existing processes and techniques and because of that backing up the SQL server database directly is often a good strategy for an organization seeking to backup SharePoint services.
Most everything that SharePoint does is stored in the associated content database. Because of this most of the things that are necessary for reconstructing a site from a disaster will be found in the database. In fact a backup of the system plus a backup of the SQL server will restore the site. In most cases, however, it’s easier to use STSADM to make a baseline backup of the site and then use SQL server to restore from catastrophic events.
The challenge with SQL backups is essentially the same as STSADM. It’s an all or nothing proposition if you loose something in a site. The one further disadvantage SQL has over STSADM is that STSADM can backup sites individually. SQL backup must backup all sites at the same time.
When backing up SQL server databases for SharePoint there is no special care that needs to be taken, beyond getting an original backup of the system which includes the additional files necessary for a complete site.
3rd Party Tools
The preceding list of backup solutions are available for free to every organization using SharePoint. However, these options may–or may not – fit into the organizations existing backup strategy. In addition each of the preceding backup solutions has its own set of limitations. If it can do single-file restores it doesn’t do a complete backup. Conversely the complete backup method via STSADM makes restoring a single file difficult.
There are at least three third parties who have SharePoint backup agents:
- Veritas NetBackup
These solutions are commercial solutions which compete on their ability to deliver a complete backup solution.
Performance monitoring and tuning are topics which most professionals know or care little about – until performance becomes a problem. It’s one of the topics that doesn’t come up frequently enough to drive a lot of interest in understanding how it works, how it fits together, and what to do about it. However, there is value in understanding the fundamentals of performance monitoring with Windows-based system and what to do based on what you find.
Searching Out the Bottlenecks
The primary activity in performance monitoring is seeking to understand the bottlenecks in the system which either is already causing performance issues or have the potential to cause performance issues in the future. Because we’re seeking out bottlenecks we’re looking – primarily – for metrics and counters which are able to tell us the relative amount of capacity that has been used and how much is remaining.
Because of this the performance metrics that we’re going to gravitate to are those which are expressed as a percentage. The percent of disk time, percent of CPU time, and percent of network usage are good examples of the kinds of metrics that we’ll want to focus on when evaluating performance at a macro level. They are not, however, an exhaustive list of metrics. They are only the metrics that are easiest to understand and extract value from quickly.
Spikes and Sustained
Even with counters that report status on a percentage of available resources there are still challenges to face. The first challenge is determining when there’s a problem because of a sustained lack of available resources and when it’s a momentary blip on the radar.
The primary consideration in performance monitoring is over what interval of time can you accept performance challenges? What level of performance is acceptable and which is not? Is it important that the CPU have some availability every second? In most cases the answer to that question is no. However, the question becomes more difficult as you ask the question over a one minute interval. Most users tolerate the occasional slow down that is over within a minute. However, hours of performance problems are a different story.
So when evaluating what is a performance problem and what isn’t a performance problem consider how long your users would be willing to accept a slow down and then ignore or temper your response to momentary spikes in whatever counter you’re looking at. Momentary spikes are a normal occurrence and simply mean that the system is pouring all of its resources into fulfilling the requests that the users have made.
Objects, Counters, and Instances
Performance monitoring on a Windows system requires an understanding of the way that Windows breaks down counters. On a Windows system performance monitoring starts with an object. An object is a broad representation of something, such as memory. This broad topic groups a set of related counters. Each counter is an individual measure in that category. For the memory object, page faults/sec, pages/sec, and committed bytes are all examples of counters. Each counter may measure the object in a different way but all of them relate to the object to which the counter belongs.
For each counter there may be multiple instances. An instance is a copy of the counter for a different part of the system. For instance, if a system has two processors, the counter for % processor time will have three instances; one for each processor and one for a total (or average) between the two processors. In most cases each instance needs to be viewed separately from the others to identify specific areas where problems may occur.
You’ll find that for most purposes there are only four areas of performance monitoring that you care about. They are: memory, disk, processor, and network. These are the key metrics because they are the core system components that are most likely to be the source of the bottlenecks.
One of the challenges in performance monitoring is the interdependence of these key subsystems on one another. A bottleneck in one area can quickly become a bottleneck in another area. Thus the order which you evaluate the performance of these subsystems is important to reaching the right conclusion.
The first characteristic to evaluate is the memory characteristic because it has the greatest potential to impact the other metrics. Memory will, in fact, often show up as a disk performance problem. Sometimes this disk problem will often become apparent before the memory issue is fully understood.
In today’s operating systems when memory is exhausted the hard disk is used as a substitute. This is a great idea since hard drives are substantially larger than memory on a server. However, it has the disadvantage that hard drives are orders of magnitude slower than memory. As a result what might be a relatively light load on memory will quickly tax a hard disk and bring both the disk and the system to its knees.
One way to mitigate this is to minimize, or eliminate the virtual memory settings in Windows to prevent Windows from using the hard drive as if it were memory. This setting can prevent a memory bottleneck from impacting the hard drives – but raises the potential for the programs running on the server to not be able to get the memory that they need. This is generally an acceptable balance for making sure that you’re aware of the true root cause of an issue.
The memory counter to watch is the pages per second (pages/sec) counter. This counter tracks the number of times that data was requested from memory but it had to actually be read from disk. This counter, above all others, helps to identify when the available memory doesn’t meet the demands of the system. A small number, say less than 100, of these is a normal consequence of a system which is running, however, sustained numbers larger than 100 may indicate a need to add more memory. If you’re seeing a situation where you need more memory you can not evaluate the disk performance reliably since the system will be using the disk to accommodate the shortage of memory.
The primary counter for monitoring disk time is the ‘% Disk Time’ counter. This counter represents the average number of pending disk requests to a disk for the interval multiplied by 100 (to get a percentage.) This calculation method leads to some confusion when the disk driver can accept multiple concurrent requests such as SCSI and Fibre Channel disks. It is possible for the instances measuring these types of disks to have a % disk time above 100%.
One of the choices to be made when selecting disk counters is whether to select Logical disk counters or Physical disk counters. Logical disk counters measure the performance relative to the partition or logical volume rather than by the physical disks involved. In other words, Logical disk counters are based on drive letter mappings rather than on the disks involved. The physical disk option shows instances for each of the hard drives that the operating system sees. These may either be physical drives or in the case of RAID controllers and SAN devices, the logical representation of the physical drive.
In general, the way that you’ll be measuring performance for disk drives the best approach is to use physical disk counters. This will allow you to see which hard disk drives are busier and which ones are not. Of course, if there’s a one-to-one relationship between your logical drives (partitions) and the physical drives (that the operating system sees) then either logical or physical disk counters are fine. However, only the physical disk counters are turned on by default. If you decide to use logical disk counters, you’ll need to run the DISKPERF command to enable logical disk counters, and reboot the system.
The % disk usage counter should be evaluated from the perspective of how long of a performance slow down you can tolerate. In most cases, disk performance is the easiest to fix – by adding additional drives. So it’s an easy target if you’re seeing sustained % disk times above 100%. If you’re on a RAID array or a SAN consider that you may want to be evaluating the % disk times from 100 % times the number of effective drives in the array. For RAID 1 and RAID 1+0, it’s one half the number of disks. For RAID 5, it’s the number of disks minus one.
Since the dawn of computing, people have been watching processing time and the processes which are consuming it. We’ve all seen the performance graphs that are created by task manager and watched in amazement at the jagged mountain range that it creates. Despite the emphasis on processor time for overall performance it’s one of the last indicators to review for performance bottlenecks. This is because it’s rarely the core problem facing computers today. In some scientific applications and others with intense processing requirements it may truly be the bottleneck – however, everyone seems to know what applications those are. For most applications processor speed just isn’t the key issue.
The most useful measure of a processor’s availability is the % processor time. This will indicate the percentage of time that the processor (or processors) were consumed. This is useful because taken over a period of time it indicates the average amount of capacity that is left.
Improving processing speed isn’t an option for most servers. The application will need to be split up, optimized, or a new server installed to replace the existing one. It is for this reason that when processing bottlenecks occur they are some of the most expensive to address.
Until recently not much thought was given to the network as a potential bottleneck but with the advent of super-sized servers with four or more processors and terabytes of disk space it has to be a consideration. Network performance monitoring is a measure of how much of the bandwidth available on the networks is actually being consumed.
This is a tricky proposition since the connected network speed may not be the total effective speed. For instance, a super-server is connected through a 1GB connection to a switch which has eight 100 MB connections. The server will assume that 1GB of data can flow through the network that it is connected to. However, in reality only 800 MB at the most is truly available to be consumed.
Another consideration is that many network drivers even today are less than stellar in their reporting performance information. More than a few network card drivers have failed to properly report what they’re doing.
In general, network performance monitoring should be done from the perspective of understanding whether it is a possible bottleneck by evaluating what the maximum effective throughput of the network is likely to be and determining what that percentage of the theoretical limit is. In general it is reasonable to assume a 60% utilization rate for Ethernet is all that is really possible.
Resolving the Details
The guidelines here may not be enough to completely diagnose a performance problem and identify a specific course of action to resolve it, however, in many cases it will be. In those cases where it’s not clear enough to be resolved by looking at the high level indicators that were mentioned here, you’ll have to dive through the other counters and identify which ones will help you isolate the problem and illuminate potential solutions.
Back in 2001, I wrote an eBook, A Beginner’s Guide to Successful Technical Publishing, which was available briefly on MightyWords.com. They closed a few months after I put the book together so the sales were not that great. I’ve had the material in my archive and recently ran across it again. Since I think it might be valuable to the community and I have limited interest in selling eBooks to make money these days, I’m making it available as a PDF for free. You can download it from https://thorprojects.com/2023https://thorprojects.com/2023/wp-content/uploads/2015/07/successfulpub.pdf.
I’d appreciate your feedback on the book — particularly from those who’ve written an article, a book, or other content.