Monday, September 24, 2007

SQL Server 2000 Clustering Tips

WARNING: Windows 2000 and SQL Server 2000 clustering is an advanced topic, and you should have a good understanding of clustering before using any of these tips.

Don't implement any of these tips into a production environment until you have successfully tested them in a test environment. Note that most of these tips are specific to SQL Server 2000.

*****

Tip: Don't turn off or on the mssqlserver, sqlserveragent, or the msdtc services off or on using any other tool other than Cluster Administrator, Enterprise Manager or Service Manager when SQL Server has been clustered. That means don't use Services or other methods to turn these services off or on.

Explanation: If you don't follow this rule, you could corrupt your SQL Server cluster installation and have to reinstall it.

In addition, be sure that "Autostart SQL Server," "Autostart SQL Server Agent," and "Autostart MSDTC" are all unselected in the "General" tab of the SQL Server "Properties" screen, and that "Auto restart SQL Server if it stops unexpectedly" and "Auto restart SQL Server Agent if it stops unexpectedly" are both unselected in the "Advanced" tab of the SQL Server Agent "Properties" screen.

Version: 2000 (Not SQL Server 7.0. See 7.0 cluster tips page for more info.)

Date Updated: 7-30-2002

*****

Tip: For optimum performance when clustering SQL Server 2000, configure an active/passive failover configuration, not an active/active failover configuration.

Explanation: In an active/passive failover configuration, there is only one instance of SQL Server running on the cluster. So if a failover should occur, the secondary server will take over once the primary server becomes unavailable. But in an active/active failover configuration, there are two instances of SQL Server running, one on each node. If one of the nodes should fail, then the remaining node will have to run two instances of SQL Server, not just one. As you can imagine, a SQL Server running two instances of SQL Server, instead just one, will experience performance degradation.

Of course, if you are running an active/active failover configuration, and each instance of SQL Server is not overly taxed, and if any single physical SQL Server can handle both instances of SQL Server, then an active/active failover configuration may be OK. But I doubt that you will see many real world examples of very expensive active/active failover configurations running little used SQL Server applications.

Version: 2000

Date Added: 5-31-2001

*****

Tip: If you are clustering SQL Server 2000, and are using the Windows 2000 Address Windowing Extensions (AWE) in order to support 3GB or more of RAM in each node of the cluster, then each node of the cluster must have the exact same amount or RAM.

Explanation: If each node in the cluster does not have the exact amount of RAM, during a failover, SQL Server may fail to start on the failover server.

Version: 2000

Date Added: 5-31-2001

*****

Tip: If you are running SQL Server in an active/active cluster, you may want to manually configure how much memory each instance of SQL Server can use, so that should a failover occur, both instances of SQL Server will have the ability to grab the amount of RAM it needs to run efficiently.

Explanation: Here's why. If node A and node B both have 2GB of RAM each, and each instance of SQL Server is allowed to use all the RAM in the server (dynamic memory is used), then if node A should fail, and that instance of SQL Server fails over to node B, the failed over instance may not have access to all the RAM it needs to run adequately because the instance of SQL Server running on node B is using all of it. The instance on SQL Server running on node B will not automatically give up RAM to the other instance in a timely manner. It will give it up eventually, but not quickly, which means that the failed over instance may run very slow for awhile.

Also, if you have manually specified in each instance that SQL Server has access to all 2GB of RAM (not dynamic memory allocation), should a failover occur, then the failed over instance will not be able to get any of the RAM it needs, and the failed over instance won't event start.

To prevent either one of these scenarios, the ideal solution in this case would be to manually allocate 1GB to each server instance. This way, should a failover occur, then both instances will have access to 1GB of RAM to run, and performance immediately after failover won't suffer much.

But you say, don't these options be waste an entire 1GB of RAM in each server? And the answer is yes. This extra available 1GB in each server will not be used, except under failover conditions. This is just one of the drawbacks of using an active/active SQL Server cluster.

Version: 7.0, 2000

Date Added: 12-12-2003

*****

Tip: If you are running multiple instances of SQL Server 2000 on a SQL Server cluster, be sure that the total amount of memory allocated to each instance of SQL Server is less than the total amount of physical RAM in the server.

Explanation: Typically, when running multiple instances of SQL Server 2000 on clustered servers, you use the Total Max Server Memory server setting to tell each instance of SQL Server how much RAM it can use. The total of this setting, for each instance of SQL Server, must be less than the total amount of physical RAM in the server. If it is not, at the very worst, one or more of the instances of SQL Server will not start, and at the very best, the server will experience heavy paging and performance will degrade significantly.

Version: 2000

Date Added: 5-31-2001




Tip: If you are going to implement replication from a SQL Server 7.0 or 2000 clustered server, and the SQL Server cluster will participate as a Publisher and a Distributor, use a file share located on the cluster disk resource as the snapshot folder. This way, replication will failover when SQL Server fails over.

Explanation: When you configure a clustered SQL Server as a Distributor, SQL Server needs to have access to a snapshot folder as a temporary holding place during the replication process. In order to ensure that replication still works when failover occurs, this folder must be located on a shared folder on the cluster's shared disk resource. If it is not, then when failover occurs, replication may stop working.

You will have to manually create the required folder on the shared disk resource, and additionally create the necessary share with appropriate permissions. In addition, you will have to configure the shared folder using Cluster administrator as a clustered shared folder.

If you don't want to do the above, one option is to not make the clustered SQL Server a Distributor, but only a Publisher, and locate the Distributor on a non-clustered SQL Server. This way, the snapshot folder will exit on a non-clustered SQL Server, negating the above advice.

Version: 7.0, 2000

Date Added: 5-31-2001

*****

Tip: You have heard this advice before, but it is so important I am repeating it here, be sure all of your clustering hardware is on the Windows 2000 Advanced Server Clustering Hardware Compatibility List (HCL).

Explanation: I can't overemphasize the importance of ensuring your hardware is supported by Windows 2000 Advanced Server clustering. If it isn't, not only will you have trouble getting support from Microsoft, the nature of the problems you probably will have will be excruciating painful and hard to troubleshoot. I know, I have been there.

What makes it difficult to ensure that all of your hardware is on the HCL is Microsoft's poor organization of the HCL itself. It is often very difficult, and time consuming, to locate all the hardware on the list to see if it has been certified.

A further complication is that Microsoft certifies hardware two different ways. First, it certifies some hardware by specific part. Second, in other cases, hardware is certified as a total system (a specific combination of parts). And in some cases, hardware certified one way is not certified another way. I recommend that you try to find hardware that is certified as a total system, that way, you can ensure that the total system you are purchasing has been tested as a total system by Microsoft.

Other problems with the HCL is that it is often out of date, sometimes parts that are still currently available (but are older models) are not listed in the HCL, and sometimes one variation of a part is certified, but a slightly different variation of the same part is not certified. So be very careful when comparing part numbers to the HCL to be sure you and the HCL are talking about the exact same part, not some obscure variation.

Version: 7.0, 2000

Date Added: 6-4-2001

*****

Tip: To ensure failover will work when needed, it is important to schedule manual failovers periodically in order to test if failover is really working.

Explanation: Just because your cluster has not had any problems you are aware of doesn't mean that there are no problems. Periodically, at least every 60 days, you need to manually failover your cluster, and fail it back, to ensure failover is working as expected. This helps to ensure failover is working should it ever be needed for a real server failure.

Version: 7.0, 2000

Date Added: 6-6-2001

*****

Tip: Don't run the regrebld utility, which are used to rebuild SQL Server's registry, on a clustered SQL Server.

Explanation: The SQL Server clustering wizard alters the registry entries of SQL Server when it is run. If you try to rebuild the registry while SQL Server is clustered, it will remove important registry entries, and most likely, SQL Server will stop functioning.

Version: 7.0, 2000

Date Added: 6-6-2001

*****

Tip: Although SQL Server 2000 supports up to 16 instances of SQL Server, running more than one instance on any one node in a cluster will degrade performance.

Explanation: Installing, configuring, managing and performance tuning just one instance of SQL Server 2000 in a cluster is challenging enough. Don't temp fate by running more than one instance of SQL Server 2000 per node in a cluster. If you do, not only will administrative headaches go up, but performance will suffer. Each instance of SQL Server has to share common hardware resources, increasing the likelihood of performance bottlenecks.

Version: 2000

Date Added: 6-26-2001




Tip: Maximizing the hardware performance of a SQL Server cluster is very similar to maximizing the hardware performance of a non-clustered SQL Server.

Explanation: When specing out new hardware for a SQL Server cluster, assuming you want maximum performance, you may want to consider purchasing the following "minimal" hardware configuration:

Minimum four CPUs, each with a 2MB L2 cache.
4GB RAM (ECC).
Fibre Channel/RAID controller with enough channels to maximize I/O.
The SQL Server executables (and operating system) should be placed on mirrored drives (RAID 1) and have its own controller channel.
Logs should be placed on mirrored drives (RAID 1) and have its own controller channel.
Tempdb should be placed on mirrored drives (RAID 1) and have its own controller channel.
Quorum disk should be placed on mirrored drives (RAID 1) and have its own controller channel.
SQL Server data files should be placed on drives configured for RAID 0+1 with multiple controller channels (as dictated by maximum of number of physical drives supported per channel).
Disk drives should be 10,000 RPM or higher. Consider a SAN option.
Two 100MBs network cards per node, minimum. Should consider gigabit network cards for public network connection.
Of course, you may want to purchase a system even bigger than the "minimum" one described above.

Version: 7.0, 2000

Date Added: 6-26-2001

*****

Tip: If the SCSI or Fibre Channel controllers you use to access a shared disk array (or SAN) resides on the cluster nodes, then ensure than writeback caching has been turned off on them. If the controller resides on the disk array or SAN, then writeback caching can be left on.

Explanation: If you don't turn off writeback caching on local controllers, it is very likely that data corruption will occur during failover because the controller does not understand clustering, and it will attempt to overwrite data on the disk. In addition, any transactions stored in the cache, but not yet written to disk, would be lost.

Version: 7.0, 2000

Date Added: 6-26-2001

*****

Tip: When installing SQL Server 7.0 or SQL Server 2000 clustering onto a Windows 2000 cluster, it is important that all non-essential services be stopped. If they are not, then the installation may abort.

Explanation: All non-essential services, whether they are built-in to Windows 2000, or are provided by third parties, must be stopped on all cluster nodes before installing SQL Server clustering. This even includes other instances of SQL Server, the Distributed Transaction Service, IIS, SNMP, and more. In fact, it is easier to list the services not to be stopped than it is to list the ones to stop.

Below is a list of the services on Windows 2000 Advanced Server and Windows 2000 Datacenter Server that should NOT be turned off. (Not all of these services will be found on all servers.) ALL other services should be turned off before installing SQL Server clustering.

Alerter
Cluster Service
Computer Browser
Distributed File System
Distributed Link Tracking Client
Distributed Link Tracking Server
DNS Client
Event Log
IPSEC Policy Agent
License Logging Service
Logical Disk Manager
Messenger
Net Logon
Plug and Play
Process Control
Remote Procedure Call (RPC) Locator
Remote Procedure Call (RPC) Service
Remote Registry Service
Removable Storage
Security Accounts Manager
Server
Spooler
TCP/IP NetBIOS Helper
Windows Management Instrumentation Driver Extensions
Windows NT LM Security Support Provider
Windows Time Service
Workstation
Note: The “Distributed Transaction Coordinator” should be turned off, but not by using the “Services” screen. Instead, it should be set to “offline” using the Cluster Administrator.

Version: 7.0, 2000

Date Added: 6-26-2001

*****

Tip: When installing software on a cluster that is to run SQL Server 2000 under Windows 2000, the order the software is installed in is very critical. If you don't follow the correct order, your clustering installation most likely will fail, or at the minimum, cause constant headaches.

Explanation: Below is the order you need to install SQL Server 2000 when using Windows 2000 Advanced Server or Windows 2000 Datacenter Server.

Install Windows 2000 on all nodes
Install Windows 2000 SP2 on all nodes
Install all of the relevant Microsoft Internet Explorer 5.5 Updates on all nodes
Create the necessary Cluster Service and SQL Server Service accounts, and add them to the local administrators group on all nodes
Disable NetBIOS on the internal private network
Install Windows 2000 Clustering on all nodes
Create the MS DTC Cluster Resource on all nodes
Stop all unnecessary services on all nodes
Install SQL Server onto all nodes (SQL Server, and SQL Server Clustering, are installed at the same time)
Install SQL Server 2000 SP1 onto all nodes
This is not meant to be a complete description of how to install clustering, only a reminder of the importance of order when installing and configuring SQL Server clustering.

Version: 2000

Date Added: 6-26-2001




Tip: If you build a four-node SQL Server cluster, ideally, three of the nodes should be running instances of SQL Server, and the remaining node should be used as the primary failover node.

Explanation: While it is possible to run SQL Server on every node of a four-node SQL Server cluster, this scenario presents a problem should one of the cluster nodes fail. If this happens, then one of the remaining nodes must take over for the failed node. Because this server is already under a load (SQL Server is running on it), it will now have to run two loads, which may be difficult, especially if both servers are already very busy (the failed server and the failover server).

To prevent performance problems should a node fail in a four-node SQL Server, the best overall solution is to only run SQL Server on three of the nodes, leaving the remaining node to be the primary failover node. It is also important that this failover node have very similar hardware as the other three servers in the node, otherwise there may be a significant performance problem when failover occurs.

Version: 2000

Date Added: 8-29-2001


*****

Tip: Backing up SQL Server data from a SQL Server cluster should be done with the utmost diligence. Remember, the shared disk array is not clustered, and is in fact a single point of failure.

Explanation: The point of SQL Server clustering is not to protect data, but to ensure that SQL Server hardware and software is running. Should a clustered node's hardware fail, or a SQL Server instance fail, then SQL Server will automatically failover to the backup node. But if your shared disk array should fail, SQL Server clustering will be of no value.

Even though your shared disk array most likely is redundant (supporting some version of RAID), it is still possible to lose your data. Because of this, making backups is very important.

The most efficient way to make backups from a clustered SQL Server is to perform disk backups, then to sweep the backups from disk to tape for offsite storage. You can either make the disk backups directly to the shared disk array, then move them to tape, or you can make the disk backups to a network share, then move them to tape.

However you decide to do this, consider creating a redundant backup job. For example, your backup job might consist of multiple redundant steps. A first step might be used to perform a disk backup of your databases to the local shared disk array, and a second step might be to move these backup files to another server. But what if the server you are copying the backup files is down, or has ran out of disk space? One option would be to create a third step that only runs if the second step fails, and is used to copy the backup files to another server. This way, should the file copy part of the job fail, then there is logic in the job to try another alternative. This is just a simple example, but I think you can see the potential of this idea.

Version: 7.0, 2000

Date Added: 8-29-2001

*****

Tip: If your shared disk array hardware supports it, consider making snapshot backups of your SQL Server databases stored on your cluster's disk array.

Explanation: SQL Server 2000 supports what are called snapshot backups. Essentially, a snapshot backup involves the mirroring of the disks that contain your SQL Server 2000 databases, and then breaking the mirror and using them as a backup.

This option allows you to make backups quickly and without affecting SQL Server's performance. Of course, your hardware must support this type of backup.

Version: 2000

Date Added: 9-5-2001

*****

Tip: To prevent the mssqlserver and the sqlserveragent services from being failed over because a less critical service on the server failed, configure the less critical services not to failover the cluster.

Explanation: For example, let's say you are running the Full-Text Search service on your clustered server. Should it fail, you most likely don't want your entire cluster (running mssqlserver and sqlserveragent) to failover because of it. Be default, this will happen unless you take explicit steps to prevent this. Of course, you may want the key SQL Server services to fail in this case, but many DBAs don't want this to happen.

Here's how to prevent this from happening. Start Cluster Administrator, and then right-click on the non-critical service that you want to prevent from forcing SQL Server to failover, and choose "Properties," then choose the "Advanced" tab.

On this tab, deselect the "Affect this group" check box, then click OK. You are done. Now, should this service stop, then it will not cause the critical SQL Server services to failover. You will want to do this for each non-critical service that affects SQL Server's services.

Version: 7.0, 2000:

Date Added: 9-5-2001

*****

Tip: Turn off any unnecessary operating system services on your clustered SQL Server that you will not be using. This will reduce the cluster's overhead and speed performance.

Explanation: Not all of the default services that are turned on when Windows NT Server or Windows 2000 is installed are necessary. The same goes true for SQL Server 7.0 and 2000. There are some SQL Server services, such as the Full-Text Search service, that aren't required for many servers, and shouldn't be turned on if they are not needed.

If your cluster servers don't need IIS or any of the other related web services, be sure they are also turned off. I have often found these services have been installed on servers that don't need them.

The more of these unnecessary services you can turn off, the less overhead your cluster will use, and the better the cluster's performance. Keep in mind that you need to turn these services off on all servers in the cluster using the Services utility. In most cases, you can get these non-essential services to Manual or Disabled, in order to turn them off. In other cases, you may be able to install them.

Here are some of the services you might want to consider turning off. This is not meant to be a comprehensive list, nor should you turn them off if you should really have a need for them. You will want to test the effects of turning off these services on a non-production server before trying it on a production server.

NetLogin
Browser
Server
Print Spooler
Task Scheduler
License Logging Service
SNMP
Messenger
Version: 7.0, 2000

Date Added: 9-5-2001




Tip: In the Windows 2000 Server Resource Kit is a tool called the Microsoft Cluster Tool. It can be used to help backup and restore a cluster configuration, along with helping to move resources from a non-clustered server to a clustered server configuration.

Explanation: The Cluster Tool includes three wizards: The Configuration Backup Wizard, the Configuration Restore Wizard, and the Resource Migration Wizard. The Configuration Backup Wizard is used to backup your current cluster configuration information, which includes group, resource, and resource type information. As you might imagine, the Configuration Restore Wizard allows you to restore cluster information, either as a whole, or individually, such as a single resource. The Migration Wizard is used to help move resources from stand-alone Windows 2000 resources to a Windows 2000 cluster.

Version: 2000

Date Added: 10-11-2001

*****

Tip: When creating SQL Server resource names (such as the name of your virtual SQL Server), only use letters or numbers in the name, not any special characters. If you do, the SQL Server cluster Setup Wizard most likely will fail.

Explanation: For whatever reason, cluster resource names that include special characters, may cause the Setup Wizard to fail. To prevent this, keep the following in mind: 1) only use letters or numbers in the resource name; 2) don't use spaces or periods in the name; 3) make sure that resource names don't match the names of resource group names (which can also cause the Setup Wizard to fail).

Version: 2000

Date Added: 10-17-2001

*****

Tip: Don't use virus checking software on your cluster. It not only makes your SQL Server cluster more unreliable, it hurts performance.

Explanation: Although, technically speaking, if your virus software claims to be cluster compatible, virus software can run on a cluster, although it is not recommended. Research has shown that high availability clusters are more highly available without virus software than they are with it. In addition, virus software chews up valuable CPU cycles and memory that can better be used by SQL Server for optimum performance.

One way to avoid viruses on a SQL Server is to ensure that any computer, client or otherwise that connects to the SQL Server cluster, has virus software on them. A second way is to ensure that your clustered SQL Server does not have any e-mail packages on it. A third way is not to browse the web from a clustered SQL Server. And a forth way is to not have any shared folders on the cluster.

Version: 7.0, 2000

Date Added: 12-11-2001

*****

Tip: Once SQL Server 2000 clustering in installed, use the SQL Server Setup program to make changes to the cluster's configuration.

Explanation: It is very important that you use the SQL Server Setup program to make many SQL Server clustering changes, otherwise you risk corrupting your installation. The following must be performed from the Setup program:

Change the virtual cluster name
Change the virtual cluster IP address
Adding or removing cluster nodes
Version: 2000

Date Added: 1-17-2002

*****

Tip: When any SQL Server administrative tools, such as Enterprise Manager, Query Analyzer, and Profiler, is performing some task, and a SQL Server failover occurs, the current process will stop and have to restarted once the failover is complete.

Explanation: SQL Server administrative tools, just like any software connected to a SQL Server cluster, will fail when failover occurs. This means that any task or process you were running (such as a SQL Server backup job) will fail, and will have to be restarted once the failover has completed. In some cases, you may want to close the software, then restart it, so that you are properly authenticated on the new server before you restart any tasks.

Version: 7.0, 2000

Date Added: 1-25-2002

*****

Tip: SQL Server clustering is not designed to prevent data failure. For optimum data protection, consider using log shipping to move data from your production cluster to another SQL Server, preferably off-site.

Explanation: While SQL Server clustering can provide fault tolerance for many aspects of a SQL Server, such as hardware, network, operating system, and application failure, it cannot protect against lost of data. This includes if the shared array fails, the connections to the shared array fails, if data is destroyed by users by accident, or if the entire cluster is destroyed.

While nightly backups are important, they still leave a lot of data unprotected. If your cluster must have very high availability, consider using log shipping (either built into SQL Server 2000, or a version that you perform yourself) to move data to another server, preferably to a server located offsite, and connected to your cluster via a WAN or high-speed LAN. This of course entails making the necessary preparations to move to the other server should the cluster fail.

Version: 7.0, 2000

Date Added: 2-5-2002

*****

Tip: Before you install Windows 2000 clustering, you must create a clustering service account that will be used by the Cluster Service. It is important that this account be setup correctly.

Explanation: One of the biggest reasons that people have problems installing Windows 2000 clustering and SQL Server 2000 clustering is because the Cluster Service account is not setup correctly. Be sure that you do the following when creating the Cluster Service account:

The Cluster Service account must be a domain account, but it does not have to be a member of the Domain Administers group.
When creating the Cluster Service account, ensure that the option, "User must change password at next logon" is not selected, that the option "Password never expires" is selected.
All of the nodes in a cluster must belong to the same domain, and belong to the same domain as the Cluster Service account.
The Cluster Service account must be a member of the local administrators group on all the servers in the cluster.
The Cluster Service account must have these server rights on each of the nodes in the cluster: "Logon as a service" and "Act as part of the operating system." These rights will automatically be given in most cases when you install the Clustering Service properly.
Version: 7.0, 2000

Date Added: 3-8-2002




Tip: Prefix the cluster nodes names, and the virtual cluster names, with appropriate prefixes so that you don't get the names confused.

Explanation: Sometimes, it is easy to confuse the names of the cluster nodes, and the virtual server names. For example, in a two-node cluster, there are two cluster node names and two virtual cluster names (one for the Windows 2000 cluster, and one for the SQL Server cluster).

One way to help prevent confusion of these names is to add an appropriate prefix to them. For example:

n1_: For node one of the cluster
n2_: For node two of the cluster
vc_: For the virtual name of the Windows 2000 cluster
vsql_For the virtual name of the SQL Server cluster
You can choose any prefix scheme that makes sense for you.

Version: 7.0, 2000

Date Added: 3-8-2002

*****

Tip: When you install SQL Server 2000 Clustering, the Full-Text Service is installed. If do not plan to use this service, you should configure it so that its failure will not cause SQL Server to failover.

Explanation: By default, when SQL Server 2000 clustering is installed, the Full-Text service is configured for failover along with all of the other SQL Server services. In addition, it is configured in such a way that if the Full-Text service fails for any reason, that it will automatically cause all of the SQL Server services to failover. If you are not using the Full-Text service, you don't want this to happen.

To prevent the failure of the Fail-Text service from failing over all of the other SQL Server services, you must follow these configuration steps:

Start Cluster Administrator.
Display all of the SQL Server services on the right side of the screen.
Right-click on the "SQL Server Fulltext" resource, and choose "Properties."
Select the "Advanced" tab from the SQL Server Fulltext Properties screen.
Deselect the checkbox next to "Affect the group."
Click "OK."
By taking this step, you have told the Cluster Service not to automatically failover all of the SQL Server resources should the Full-Text service fail for any reason. If it should fail, you will see a warning message in Cluster Administrator, but no other cluster resources will be affected.

Once you have done this, if you don't use the Full-Text Service, you can use the Cluster Administrator to take this resource off-line. But don't try to delete this resource, as it can cause your SQL Server cluster installation to become corrupt.

Version: 2000

Date Added: 3-8-2002

*****

Tip: If you want to find out the names of the nodes in your SQL Server cluster, you can with the fn_virtualservernodes() table function.

Explanation: To display all of the names of the nodes actively participating in your SQL Server cluster, run this statement:

select * from ::fn_virtualservernodes()

Version: 2000

Date Added: 3-8-2002

*****

Tip: You can affect how long it will take your SQL Server to failover from the primary node to the second node by changing the "recovery interval" SQL Server setting.

Explanation: When SQL Server fails from one node to another, it must go through the automatic recovery process when it restarts on the new node. This means that any transactions that were not written to the database when the failover occurred must be rolled forward or back as appropriate via the transaction log. This can take some time if there are a lot of transactions in the buffer that had yet to be written to the database when the failover occurred.

You can directly affect the recovery period by changing SQL Server's "recovery interval" setting. By default, this setting is set to "0", which means that it is set to automatic configuration. In effect, this means that the recovery period should not be longer than one minute.

But some DBAs change the "recovery interval" to be a specific number, such as five. When you assign a number to the "recovery interval" setting, what you are doing is telling SQL Server not to take more than this number of minutes to recover. For example, if five is used as the setting, this tells SQL Server that it can take up to five minutes to recover. This, in effect, reduces how often checkpoints are performed in SQL Server. A checkpoint is when dirty buffers in the SQL Server cache are written to the database.

As you can see, if you set the "recovery interval" to some number other than "0", then it will take longer for a failover to occur, as it takes longer for the automatic recovery process to run. In most cases, you should leave the "recovery interval" set to the default value of "0". This will help ensure that failover times are kept as short as possible.

Version: 7.0, 2000

Date Added: 3-8-2002

*****

Tip: When configuring the IP address and subnet mask of the private "heartbeat" network in a cluster, select a subnet other than the one you use for your public network, and ideally, select one that is not currently used by your company.

Explanation: The private "heartbeat" network is used by a cluster for internal communication between the nodes of the cluster. This network is isolated from all other networks in your company by the fact that it runs over its own network cable (often directly wired from network card to network card using a special cross-over cable).

To prevent any potential for IP address conflicts, should someone make a mistake and connect the internal network to the public network, the internal network should be on its own subnet. This way, should someone make a mistake and connect the internal network to the public network, then there won't be any potential for IP address conflicts and other strange problems arising.

While this is not a requirement, consider it a good preventative measure against future potential problems.

Version: 7.0, 2000

Date Added: 3-11-2002

*****

Tip: Don't locate the Windows 2000 paging (swap) file on a shared array. Place it on a local drive in each node in the cluster.

Explanation: Since the paging file is specific to a specific computer, not to the cluster, the paging file should be located on the nodes of the cluster, not the shared array. Not only will performance suffer if you do this, it could create cluster-related problems during failover.

Version: 7.0, 2000

Date Added: 5-23-2002




Tip: When administering a cluster using Cluster Administrator, don't run Cluster Administrator through Terminal Server, as what you see displayed on the screen from Cluster Administrator may not be correct.

Explanation: Due to an apparent bug, when you try to run Cluster Administrator from within a Terminal Services session, Cluster Administrator may not correctly reflect the status of the various cluster resources. As you might expect, this can lead to confusion and many potential problems.

To be safe, only access Cluster Administrator from the cluster nodes themselves, or from a locally installed copy of Cluster Administrator on your desktop.

Version: 2000. May be a problem in 7.0, but I have not been able to verify this.

Date Added: 6-10-2002

*****

Tip: When you try to use a strong password for the cluster service account, Cluster Service may not be able to recognize it, causing the installing of Windows 2000 clustering to fail.

Explanation: Due to an apparent bug, Cluster Service, when it is being installed, may not accept strong passwords for the cluster service account. A strong password is a password with letters, numbers, upper and lower case, and special characters. When the installation process runs, it may not be able to correctly interpret the password, and the installation process fails. This may be true even given the fact that Active Directory has no problem accepting the strong password.

For now, if you run into this problem, the only work-around is to use a less strong password for the cluster service account. Contributed by Paul Lynch.

Version: 2000

Date Added: 6-10-2002

*****

Tip: If you need to change the account or password of the accounts used for the SQL Server service accounts, it is best to change them through Enterprise Manager, not from the Services option available from the operating system.

Explanation: Changing the service account or account password from Enterprise Manager automatically changes the account information on all nodes of your cluster in a single step. If you change account information using the Services option, then you must do this for each SQL Service account affected, on each node of the cluster, which could introduce potential problems if you make any mistakes.

Version: 2000

Date Added: 7-1-2002

*****

Tip: The drive letter used for a shared array by SQL Server clustering must be the same drive letter on all nodes of the cluster

Explanation: When you configure your nodes for SQL Server clustering, one of your chores is to create a shared array for your instance of SQL Server, and then assign it a drive letter. It is very important that each node use the exact same drive letter for the shared array. If you don't, then SQL Server clustering will not be able to fail over from one node to another.

Version: 7.0, 2000

Date Added: 7-1-2002

*****

Tip: If you want to initiate a manual fail over of SQL Server from one SQL Server clustered node to another, you must do this through Cluster Administrator, not Enterprise Manager.

Explanation: SQL Server clustering is designed in such a way as to only allow the manual failover of one node to another by using the Cluster Administrator. If you try to do this through Enterprise Manager by stopping the SQL Server services, manual failover will not occur automatically. What will happen is that the cluster service will automatically restart the service on the same node. If you try stopping the SQL Server services more than four times in a row, SQL Server may failover, but it also may produce some unexpected results. So don't do it.

Version: 7.0, 2000

Date Added: 7-1-2002

*****

Tip: Do not configure a clustered file share on the same array used for SQL Server data.

Explanation: Configuring a clustered file share on an array used by SQL Server is possible, but it presents two problems of concern. First, in increases the amount of time for SQL Server to failover from one node to another. Second, if the file share needs to failover, but SQL Server does not need to failover, failover will occur anyway, possibly causing problems for your users when the failover occurs.

Version: 2000

Date Added: 7-1-2002

*****

Tip: If your cluster uses Extended Stored Procedures, the xp's must be installed on a shared array. If the xp's use COM objects, these must be installed and registered on all modes of the cluster.

Explanation: If xp's used in a clustered SQL Server are not located on the shared array, then should failover occur, the xp's will fail to run. If COM objects used by an xp are not installed on all nodes and properly registered, then should failover occur, they will not function.

Version: 2000

Date Added: 7-1-2002

*****

Tip: Under some circumstances, the private network connection of a cluster can switch from "Internal cluster communications only (private network)" to "All communications (mixed network)". This can cause the network connection for the cluster to fail, preventing users from accessing your cluster.

Explanation: Windows 2000 uses a technology called Media Sense, which is part of the Plug and Play features of the operating system. The default behavior of Media Sense is to destroy a network card's TCP/IP connection when the network connection fails (for whatever reason). When that happens, the Cluster Service receives a notification of the event and removes this network connection from the available connections. In most cases, this is not a problem.

Unfortunately, if network connectivity to both network adapters in a cluster node fail at the same time, the network role for the cluster can be changed from "Internal cluster communications only (private network)" to "All communications (mixed network)." When this happens, it is possible for the cluster resources to be bound to the network adapter designed for the private network, not the public network, which means users cannot access the cluster. In effect, the cluster has failed and no clients can connect to it.

In most cases, this problem can be fixed manually, by changing the private network connection role back to "Internal cluster communications only (private network)," and then rebooting all of the nodes of the cluster and bringing it back up properly. But this is a temporary fix. If this problem repeats itself, you have no choice but to turn of the Media Sense feature of the operating system. See this Microsoft Article on how to hack the registry to turn off Media Sense.

Version: 7.0, 2000 (Windows 2000 Advanced Server issue)

Date Added: 8-29-2002




Tip: When you install Windows 2000 clustering (before you install SQL Server clustering), it is important that you specify the proper network cards for the public and private networks. If you don't, and you later need to make a change, you will have uninstall SQL Server and Windows 2000 clustering, and then reinstall them correctly.

Explanation: When installing Windows 2000 clustering, you have to be very careful that you select the appropriate network card for the public and private connections for each node. It is easy to get confused. What may be surprising is that a wrong configuration can often work, making you believe that everything is OK.

Once you have discovered that you have made a mistake, don't be tempted to try and change the assignments using the Cluster Administrator, as it may appear to work, but it won't, causing further problems.

The only real solution is to remove the SQL Server Clustering and Windows 2000 Clustering, and then reinstall it using the correct network card selections.

Version: 7.0, 2000

Date Added: 10-18-2002

*****

Tip: To learn how to interpret the cluster log, see chapter 20, "Interpreting the Cluster Log" in the book Microsoft Windows 2000 Server Distributed Systems Guide from the Windows 2000 Resource Kit.

Explanation: One of the best ways to learn about Windows 2000 and SQL Server 2000 cluster behavior is to examine the cluster.log file, found in the Drive:\WINNT\cluster folder of your cluster nodes. This log provides detailed information on what is going on, and is good for troubleshooting.

Although the log has lots of detailed information, it is very hard to read and interpret. The only place I have found that even begins to explain the cluster.log is the chapter referenced above. Information on the Internet is very minimal, including Microsoft's website. If you really want to learn more about this log, your only choice is to purchase the Windows 2000 Resource Kit, and read this chapter.

When reading this log, keep in mind that the time stamps for each entry is based on Universal Time (Greenwich Mean Time), not local time. If you don't know this, reading this log will be very confusing.

Version: 2000

Date Added: 10-24-2002

*****

Tip: When configuring the private network in a Windows 2000 cluster, select the option "All Communications (Mixed Network)," not the option, "Internal cluster communications only (private network)."

Explanation: In most cases, the private "heartbeat" connection between the nodes of a cluster is a cross-over cable (generally between two nodes of a two-node cluster) or a small hub or switch (when more than two nodes exist in a cluster. This network is completely separate from the "public" network and is only used for cluster node communications.

When configuring the private network, if you choose the option "All Communications (Mixed Network)," this means that should the private network go down,that the communications between the nodes in the cluster will not fail, instead moving over the public network. This is an added layer of redundancy that has helped me out in more than one occasion.

If you choose the "Internal cluster communications only (private network)" option, and the private connection fails, your primary node will not be able to failover to a secondary node if it should fail.

Version: Windows 2000, SQL Server 7.0, SQL Server 2000

Date Added: 11-26-2002

*****

Tip: In addition to using Cluster Administrator to manage your Windows 2000 clustering, you can also use the Cluster Administrator Command Line option instead.

Explanation: Sometimes, it is more convenient to manage clustering using the Cluster Administrator Command Line tool instead of using the GUI-based Cluster Administrator. For example, let's say you want to schedule a task for the evening, when there are less users on the cluster. This is great, except you can't schedule a task using the GUI Cluster Administrator. But if you use the Command Line version, you can. You can create the command as part of a batch file, then schedule the batch file to run using the Windows 2000 scheduler, or other scheduling tool.

To run the Command Line version of the Cluster Administrator, you must go the the command prompt and type in:

cluster [options]

If you have never used the cluster.exe command (the Command Line version of Cluster Administrator), you must take some time to learn the various options needed to accomplish your goal. You can list the options by typing in the following at the command line:

cluster /?

This will display all of the available command line options. As you might imagine, some of these are a little obscure and you will need to look them up. But once you do, you may find using the Command Line version of the Cluster Administrator very handy.

Version: NT Server Enterprise 4.0, Windows 2000 Advanced Server, SQL Server 7.0, SQL Server 2000

No comments: