DFS for WebFarm Usage - Content Replication and Failover
 
Published: 17 Oct 2006
Abstract
This article examines Windows Distributed File System (DFS) in detail.
by Web Team at ORCS Web
Feedback
Average Rating: This article has not yet been rated.
Views (Total / Last 10 Days): 47956/ 69

Introduction

Windows Distributed File System (DFS) has been around for a long time and it has always had a lot to offer.  With the latest update in Windows Server 2003 R2, DFS has become quite an impressive product.

At ORCS Web, we have recently started to use DFS for some of our high availability offerings that use a central NAS (Network Attached Storage) content server.  We are using DFS for handling the content server, both for replication and for automatic failover to a backup server in the event of maintenance or a server failure.

There were a number of things that I learned while researching, testing and rolling out DFS for WebFarm content hosting that I will share here.  This is not a step-by-step walkthrough, but rather some pointers that you will hopefully find useful.

DFS has many usages ranging from keeping content in sync between different physical sites, to giving a single easy-to-remember path that can serve up content from a variety of folders across a local or wide area network (thus the "distributed" in DFS).

DFS in its simplest form is a way to have a single friendly UNC path on your network which can have folders distributed across multiple servers.  This friendly UNC path will be permanent while the real folders that it accesses behind the scenes can be most anywhere. Subfolders can point to completely different locations on disk or to different servers on your network.  This flexibility is great for our WebFarm situation and allows a primary and at least one backup server to handle the content with a clean failover solution in the event that the primary server fails.

Installation

The installation is straightforward once you understand the concepts.  Partial DFS functionality is already installed on Windows Server 2003.  The replication side of things needs to be installed separately.  As long as you have upgraded to Windows Server 2003 R2 you can install this from Add/Remove programs and the Distributed File System category.  I recommend installing all 3 optional features as the extra management tools are better for managing your redundant DFS system.  This needs to be installed on the servers hosting the namespaces and the folder targets if you will use replication.

The extra replication features of R2 do require Active Directory changes.  If you have already upgraded your domain controllers to R2, then no additional action is required.  If you have not upgraded your domain controller to R2, no worries, you are not required to do so, but you do need to extend the schema.  Here is a link on how to do that.

Like anything of this nature, make sure to have a good disaster recovery plan in place and do this at a non-peak time.  However, the schema installation is straightforward and does not cause any interruption of service in Active Directory.

Once installed, three hot fixes should be installed, located here.  One is required for the client failback feature to fail back to the primary content server when it is back online after a failure, another allows you to have multiple domain-based DFS namespaces on Windows Server 2003 Standard Edition if you desire, and the 3rd supposedly fixes a potential RPC issue with replication (although I did not run into this issue). KB Article 898900 needs to be installed on all of the servers accessing DFS (the web nodes). The other two need to be installed on the DFS content servers.

Configuration

You have two graphical tools to use at this point, both support most features.  My preference is the DFS Management tool, which is available after the Add/Remove programs step above.  You will find this in Administrative Tools.

There are 3 terms/levels to take note of: Namespace, Folder and Target Folder.  These terminologies changed with R2, so do not get confused with terms you used in the past.

Top Level - Namespace

A namespace is a container to hold the folder and replication settings.  The path to the namespace might be something like \\Domain\Webfarm.  You can have multiple namespaces per server.

Second Level - Folder

A folder is a virtual DFS folder that can have one or more target folders.  The name of the folder is what is used in the UNC path.  For example, \\Domain\Webfarm\Site1 is where Site1 is the Folder.

Third Level - Folder Target

A folder target is the real location of the content.  This path is masked though and not seen in the DFS UNC path.

You can have multiple target folders that point to different physical locations.  There are various options to determine which target folder is used, but in our case we want to always point to a primary content server and only fail over to the backup content server when the primary server is unavailable.

Active Directory comes into play with domain-based namespaces, but management is still done from DFS Management.

Redundancy

Here is where it gets fun.  To have everything fully redundant in the event that a server fails every part of this needs to be mirrored.  I will discuss the various levels of redundancy here.

Namespace

The namespace server holds the metadata for the namespace.  Be sure that this does not depend on a single server.  The data stored here is often small unless you have hundreds or thousands of folders in the namespace, so a dedicated server is not necessarily required for this role as long as the namespace server can always respond quickly to any queries.  The namespace servers can be the same servers as your content if you want.

To create a mirrored copy of the namespace, in the DFS Management tool, right-click on the Namespace and click on "Add Namespace Computer."  Here you can point to an existing share on a different server or create a new share.

Folder Target

DFS masks which server is used for the folder target.  To fully use DFS in this situation, you will need to point to multiple folder targets.  In my situation, I want to have one server always used as long as it is available.  I do not want to hit a random server because there could be data integrity issues.  DFS replication is good, but it does not handle data locking or data write-through.  This means that there could be a delay from when something is written on disk until it has replicated to all other servers.  For that reason, I only want to fail over when absolutely necessary.

To achieve this there are a few things that are necessary.

The failback hot fix mentioned above needs to be installed.

All webfarm nodes need to be running Windows Server 2003 SP1 or later

The caching duration for the folders needs to be changed.  The default is 1800 seconds (30 minutes) which is too long for our situation.  That means that less requests are made to the namespace folder, but it also means that the failback could take up to 30 minutes after the primary server is back online.  You can update this by right-clicking on the folder in "DFS Management," going to properties and then the Referrals tab.  Make sure to do this on each new folder.  You can also change the cache duration on the namespace, but the default is already 300 seconds (5 minutes).

In the Referrals tab of the namespace properties, check the "Clients fail back to preferred targets" checkbox.

In the Referrals tab of the folder properties, check the "Clients fail back to preferred targets" checkbox.

On the properties of the primary folder target, in the Advanced tab, enable "Override referral ordering" and select "First among all targets."

On the properties of the backup folder targets, in the Advanced tab, enable "Override referral ordering" and select "Last among all targets."

Now you have a primary/backup server configuration that will always use the primary server as long as it is available.

Active Directory

The Active Directory part of things is done automatically and apart from the steps mentioned already, does not need any extra configuration.  Just be sure to have redundant domain controllers in your Active Directory environment.

Links and Paths

There is a growing list of links and paths that can be used for testing purposes.  Let me summarize them here, assuming that the folder is called Site1 and the Folder Targets are also given the same name.

Using the DFS path directly: (DFS level)
   \\domain\webfarm\Site1

Accessing directly using the first namespace server: (namespace level)
   \\namespaceserver1\webfarm\Site1

Accessing directly using the second namespace server: (namespace level)
   \\namespaceserver2\webfarm\Site1

Accessing content directly on primary server without using DFS: (folder target level)
   \\contentserver1\Site1

Accessing content directly on second server without using DFS: (folder target level)
   \\contentserver2\Site1

Notice that it is the DFS path (\\domain\webfarm\Site1) which will be used on the web servers and for most usages.  It will always be the same, regardless if the namespace or target folder changes over time. The other paths are for testing and troubleshooting and could change over time.

Content Replication

With R2, DFS replication uses what is called Remote Differential Compression (RDC) which will only update changes to files and will not send the entire file across the wire.  This is especially handy when replicating across a wide area network, but it is also good for this situation.

If you set up two or more folder targets using DFS Management, the wizard should have asked you if you want to set up replication, but if you did things in a different order, you can set it up manually after the fact.  This can be done using the DFS Management tool as well.

Changes to the servers are not immediate so DFS does not work well for transactional type data where both servers need to be 100% in sync within a couple seconds of each other.  However, for a website related situation that is mostly read intensive, DFS works great.

You have a few options, but in our situation we will use the Full mesh which means that any server will write to any other server.  This means that in a failure situation, the content changes made on the backup server will push back to the primary server when it is online again.

How Good Is It?

DFS failovers are impressive.  If the primary content server becomes unavailable, DFS will fail over to the backup content server in a small number of seconds.  In this webfarm situation almost every time that the primary server fails, the HTTP protocol will retry for a few seconds until IIS is able to serve up a successful page.

This means that there is zero downtime if the primary content server fails.  The only issue I ran into when testing is if the page load was 1/2 done when the primary server failed using master pages or web controls.  It could potentially process 1/2 of an ASP.NET page and fail processing the rest.  This is pretty rare and I would say that the failover is as close to perfect as can be.

A failure of the namespace server is even smoother, resulting in no noticeable downtime or slowness.

File Change Notification in ASP.NET

There is one thing to keep in mind during a failover and failback situation.  ASP.NET and IIS uses what is called File Change Notification (FCN) to let IIS know of any changes to the files.  For example, if you add a new .dll to your /bin folder, ASP.NET will recycle the AppDomain and reload and recompile some of the site.  During a failure, although the switchover is smooth, it does take a few seconds which is abrupt enough for IIS and ASP.NET to reestablish the File Change Notification handle using the different content server.

The issue comes with the failback.  The failback is so smooth that the File Change Notification is not updated back to the restored server.  This means that if you make any changes to ASP.NET files on the restored content server, the changes are not noticed by IIS and ASP.NET.  Even deleting the entire /bin folder will not be recognized by ASP.NET if the site was visited and cached while running on the backup server.  Static pages do not have this issue, but the caching in ASP.NET makes this a problem.  At the time of this writing, I am working with Microsoft Product Support Services (PSS) to try to find a good solution for this.  To resolve it, simply recycle the app pool of the site(s) and it will start to function normally again.  So, this is not necessarily a show-stopper, but it is something to keep in mind with the failover/failback.

Caching and DFS

DFS client computers (webfarm nodes in this case) cache the DFS information for the length of time that you specify.  This should not be too low or you will have too much traffic to the Namespace server, but it should not be too high or changes to the namespace and failbacks to a restored server will take a long time to be noticed.  It is up to your environment what you want to set this at, but in every situation it is important to know that there is some caching that takes place.

Make sure to keep in mind that adding a new folder to your DFS namespace will not be noticed immediately.  You can force the DFS client cache to be cleared by running dfsutil /PktFlush from the client server.  The dfsutil.exe is a tool that is available in the Windows Server 2003 /support/tools folder of the installation CD.  I simply copy that file to C:\Windows\System32 and I can run dfsutil from the command prompt.

When setting up new sites, make sure to wait until the new site has been recognized by all of the webfarm nodes or force a cache flush from all of the nodes before attempting to set up or update the site.

Backups of the Namespace

Make sure to make regular backups of your Namespace. This can be done easily using DFSUtil. Simply export to an .xml file on a regular basis and have your backup process back up that file. An example of the syntax needed is:

dfsutil /root:\\OW\webfarm /export:c:\NameSpaceBackups\webfarmroot.xml

I did run into something when importing the namespace.  I received the following error:

System error 1168 has occurred.
Element not found.

After some research and stumbling through it, I found out that I was using the domain name "orcsweb.com" instead of NetBIOS name "OW" in the UNC path, which the import did not like. OW is used by DFS in this case.  The export worked with either name, but the import only worked with \\OW\ which is what was in the exported XML file.

Links and Resources

Summary

There is a lot to consider with DFS and I have only scratched the surface, but I hope that this has been helpful to cover a few common configuration settings that are required for configuring DFS on Windows Server 2003 R2 in a WebFarm situation.

Scott Forsyth is the Director of IT at ORCS Web, Inc. - a company that provides managed complex hosting for clients who develop and deploy their applications on Microsoft Windows platforms.



User Comments

No comments posted yet.

Product Spotlight
Product Spotlight 





Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-04-25 10:25:33 PM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search