70-223-Windows® 2000 Clustering
Service
Before you start
This study guide provides you with information on
the many different aspects of "W2K Clustering Service". It serves as
an introductory guideline on Clustering.
Before you proceed with this subject, please read
through the study material for the following and make sure you are
100% comfortable with the W2K architecture
It is not easy for people to get their hands on clustering
equipments. Please visit MS web site and get the latest list of
equipment that support W2K Clustering, and see if you can have
chance to really use it for awhile.
Clustering is a broad and advanced topic. Do NOT
rely solely on this study notes for the exam. By all means read more
than one book on the subject and make sure you understand the
material well enough so that you could be ready for the questions.
There is no quick way to succeed for this topic. Ideally you must
work things out and gain experience before even trying to sign up
for the exam.
The reasons we deploy cluster are: high
availability, scalability, and manageability. While the group of
clustered nodes must be locally connected, administrators can
remotely control the cluster.
Terminology
-
Cluster service is the Windows 2000 name for the original
Microsoft Cluster Server (MSCS) in Windows NT Server 4.0,
Enterprise Edition.
-
Individual computers are referred to as nodes.
-
Cluster service is the collection of components on each node
that perform cluster-specific activity
-
Resources are the hardware and software components within the
cluster. Resource DLLs define resource abstractions, communication
interfaces, and management operations. A resource is online when
it is available and providing its service. Cluster resources can
include physical hardware devices such as disk drives and network
cards, and logical items such as IP addresses, and applications.
Each node will have its own local resources, and at the same time
the cluster also has common resources, mostly being the common
data storage array and private cluster network, which are
accessible by each node in the cluster.
-
Quorum resource is a physical disk in the common cluster disk
array that must be present for node operations to occur.
-
Resource group is a collection of resources managed as a
single, logical unit. When a service is performed on a resource
group, the operation affects all individual resources within the
group. A resource group can be owned by only one node at a time.
Also, individual resources within a group must exist on the node
that currently owns the group. Keep in mind that at any given
instance, different servers in the cluster cannot own different
resources in the same resource group. In case of failure, resource
groups can be failed over or moved as atomic units from the failed
node to another available node.
-
Cluster-wide policy - each resource group has an associated
cluster-wide policy that specifies which server the group prefers
to run on and which server the group should move to in case of a
failure.
-
Resource dependencies - each resource in a group may depend on
other resources in the cluster. This is expressed in terms of
dependencies - relationships between resources that indicate which
resources need to be started and available before another resource
can be started. They are identified using Cluster service resource
group properties and enable Cluster service to control the order
in which resources are brought on and off line. However, the scope
of any identified dependency is limited to resources within the
same resource group only.
-
Node preference list is a resource group property used to
assign a resource group to a node. In clusters with more than two
nodes, the node preference list for each resource group can
specify a preferred server plus one or more prioritized
alternatives to enable cascading failover - a resource group may
survive multiple server failures, as each time it will be failing
over to the next server on its node preference list. As the
Cluster administrators, you can set up different node preference
lists for each resource group on a server.
-
Shared-nothing model refers to how servers in a cluster manage
and use local and common cluster devices and resources. Each
server owns and manages its local devices, but for devices common
to the cluster, they are selectively owned and managed by a single
server at any given time.
-
Virtual Servers hide the complexity of clustering operations.
To users and clients, connecting to an application running as a
clustered virtual server appears to be the same process as
connecting to a single server. User will not know which node is
actually hosting the virtual server. Cluster service manages the
virtual server as a resource group, with each containing two
resources: an IP address and a network name. To the client, it is
simply a view of individual network names and IP addresses.
-
Devices - to set up Clustering service, external storage
devices common to the cluster require SCSI devices and support
standard PCI-based SCSI connections as well as SCSI over fiber
channel and SCSI bus with multiple initiators. Windows 2000
Datacenter Server supports four-node clusters and require device
connections using Fiber Channel. The main point is that the
connection must be real fast and reliable between the nodes and
the shared devices.
-
Two Types of Clusters: Cluster service is intended to provide
failover support for applications. On the other hand, Network Load
Balancing service load balances incoming IP traffic across
clusters of up to 32 nodes to enhance both the availability and
scalability of Internet server-based programs. You can combine the
two. Typically this involves deploying Network Load Balancing
across a front-end Web server farm, and clustering back-end
line-of-business applications such as databases with Cluster
service.
-
NLB - Network Load Balancing lets system administrators build
clusters with up to 32 hosts among which it load-balances incoming
client requests. The setup is completely transparent, meaning that
clients are unable to distinguish the cluster from a single
server, and programs are not aware that they are running in a
cluster setup. Control can be defined on a port-by-port level, and
hosts can be added to or removed from a cluster without
interrupting services. In a NLB setup, host failures are detected
within five seconds, and recovery is accomplished within ten
seconds - workload is automatically and transparently
redistributed among the cluster hosts.
-
Performance measurements have shown that Network Load
Balancing's efficient software implementation imposes very low
overhead on network traffic-handling and delivers excellent
performance-scaling limited only by subnet bandwidth. Network Load
Balancing has demonstrated more than 200 Mbps throughput in
realistic customer scenarios handling e-commerce loads of more
than 800 million requests per day.
Components
Checkpoint Manager is for saving application
registry keys in a cluster directory that is stored on the quorum
resource. Communications Manager manages communications between
cluster nodes. Configuration Database Manager maintains cluster
configuration information. Event Processor receives event messages
from cluster resources and requests from applications to enumerate
cluster objects. Event Log Manager replicates event log entries from
one node to all other nodes in the cluster. Failover Manager
performs resource management and initiates appropriate actions.
Global Update Manager provides global update service used by cluster
components. Log Manager writes changes to recovery logs stored on
the quorum resource. Membership Manager manages cluster membership.
It also monitors the health of other nodes in the cluster. Node
Manager assigns resource group ownership to nodes based on 2
factors: the group preference lists and node availability. Object
Manager manages all the cluster service objects. Resource Monitors
monitor the health of each cluster resource using callbacks to
resources DLLs. Resource Monitors provide the communication
interface between resource DLLs and the Cluster service. When the
Cluster service needs to obtain data from a resource, the Resource
Monitor receives the request and forwards it to the appropriate
resource DLL, and vice versa. Keep in mind that it runs in a process
separate from the Cluster service to protect the Cluster service
from resource failures.
The Node Manager runs on each node. It maintains a
local list of nodes that belong to the cluster. Periodically, it
sends messages heartbeats to its counterparts running on other nodes
to detect node failures. If one node detects a communication failure
with another cluster node, it broadcasts a message to the entire
cluster causing all members to verify their view of the current
cluster membership for a regroup event. No write operations to any
disk devices common to all nodes in the cluster is allowed until the
membership has stabilized. The node not responding is removed from
the cluster and its active resource groups are moved to another
active node. To select the node to which a resource group should be
moved in a set up with more than 2 nodes, Node Manager identifies
the node on which a resource group prefers to run and the possible
nodes that may own individual resources.
The Configuration Database Manager implements
functions needed to maintain the cluster configuration database with
information about all of the physical and logical entities in a
cluster. Each Configuration Database Manager running on each node
cooperates to maintain consistent configuration information across
the cluster with the one-phase commit method to ensure the
consistency of the copies of the configuration database on all
nodes. Keep in mind that Cluster-aware applications use the cluster
configuration database to store recovery information. For
applications that are not cluster-aware, information is stored in
the local server registry. The Log Manager together with the
Checkpoint Manager ensures that the recovery log on the quorum
resource contains the most recent configuration data and change
checkpoints. These are done to ensure that the Cluster service can
recover from a resource failure.
Supported Services
Services supported by clustering are determined by
the availability of the corresponding Resource DLLs. Resource DLLs
provided with Windows NT Server 4.0, Enterprise Edition enable
Cluster service to support File and print shares, Generic services
or applications, Physical disks, Microsoft Distributed Transaction
Coordinator, Internet Information Services, Message Queuing, Network
addressing and naming. With Windows 2000 Advanced Server and Windows
2000 Datacenter Server, we have support for the following additional
services: Distributed File System, Dynamic Host Configuration
Protocol. Network News Transfer Protocol, Simple Message Transfer
Protocol and Windows Internet Service (WINS). In addition,
cluster-aware applications that provide their own resource DLLS can
enable customized advanced scalability and failover functions.
Failover and Failback
Failover can occur automatically when a failure
occurs, or when you manually trigger it. Resources are gracefully
shut down for a manual failover, but are forcefully shut down in the
failure case. Automatic failover requires determining what groups
were running on the failed node and which nodes should take
ownership, meaning all nodes in the cluster need to negotiate among
themselves for ownership based on node capabilities, current load,
application feedback, or the node preference list. Cascading
failover has the assumption that every other server in the cluster
has some excess capacity to absorb a portion of any other failed
server's workload.
When a previously down node comes back online, the
Failover Manager can decide to move some resource groups back to the
recovered node via failback. To allow this to happen the properties
of a resource group must have a preferred owner defined in order to
failback to a recovered or restarted node. Resource groups which is
the preferred owner will be moved from the current owner to the
recovered or restarted node. To avoid causing extra troubles,
cluster service provides protection against failback of resource
groups at peak processing times, or to nodes that have not been
correctly recovered or restarted.
For failure detection, you want to know the
difference between the following two mechanisms
Cluster Server Installation and Operation
The Software Requirements for installing Cluster Services
include
-
Microsoft Windows 2000 Advanced Server or Windows 2000
Datacenter Server
-
DNS, WINS, or HOSTS … naming methods. DNS is preferable.
-
Terminal Server is optional. It allows remote cluster
administration.
For the Hardware, the node must meet the hardware requirements
for Windows 2000 Advanced Server or Windows 2000 Datacenter Server.
Also, the cluster hardware must be on the Cluster Service Hardware
Compatibility List.
For the 2 HCL-approved computers, each must have a boot disk with
Windows 2000 Advanced Server or Windows 2000 Datacenter Server
installed. The boot disk cannot be on the shared storage bus though.
Then we need a separate PCI storage host adapter using SCSI or Fiber
Channel for the shared disks. Regarding the shared disk, we need an
HCL-approved external disk storage unit that connects to all
computers. RAID is not a must, but is recommended. All shared disks
must be configured as basic disks, and that all partitions on the
disks must be formatted as NTFS.
It is very important that all hardware should be completely
identical for all nodes, so that configuration is much easier.
Network Requirements
include
-
A unique NetBIOS cluster name.
-
5unique, static IP addresses: 2for the network adapters on the
private network, 2 for the network adapters on the public network,
and 1 for the cluster itself.
-
A domain user account for Cluster service. Keep in mind that
all nodes must be members of the same domain.
-
Each node should have two network adapters, so that 1 can be
used for connection to the public network and the other for the
node-to-node private cluster network.
In order to configure the Cluster service on a Windows 2000-based
server, your account must have administrative permissions on each
node. Also, all nodes must be member servers, or all nodes must be
domain controllers within the same domain, meaning a mix of domain
controllers and member servers in a cluster is NOT ok.
During installation of Cluster service on the first node, all
other nodes must be offline, and that all shared storage devices
should be powered up. Initial cluster configuration information will
need to be supplied using the Cluster Service Configuration Wizard.
Cluster service files are located on the Windows 2000 Advanced
Server or Windows 2000 Datacenter Server CD-ROM's \i386 directory.
You may install using the CD or over the network.
After setting up the first computer, add the common data storage
devices that will be available to all members of the cluster. This
establishes the new cluster with a single node. Then you run the
installation utility on each additional computer that will be a
member in the cluster. As each new node is added, it automatically
receives a copy of the existing cluster database from the original
member of the cluster.
During setup, the quorum resource acts as the role of tiebreaker
when a cluster is formed, or when network connections between nodes
fail. The quorum resource on the common cluster device stores the
most current version of the configuration database in the form of
recovery logs that contain node-independent cluster configuration
and state data. And during cluster operations, the Cluster service
uses the quorum recovery logs to guarantee that only one set of
active, communicating nodes is allowed to form a cluster, to enable
a node to form a cluster only if it can gain control of the quorum
resource, and to allow a node to join or remain in an existing
cluster only if it can communicate with the node that controls the
quorum resource.
When a cluster is formed, each node may be in one of the three
distinct states recorded by the Event Processor and replicated by
the Event log Manager to other clusters in the node
To join an existing cluster, a server must have the Cluster
service running and must successfully locate another node in the
cluster via a discovery process. After locating another cluster
node, the joining server must be authenticated and receive a
replicated copy of the cluster configuration database.
Note that Cluster service of Windows 2000 supports rolling
operating system upgrades from Windows NT Server 4.0 Enterprise
Edition clusters deployed with Service Pack 4 or higher. This
provides users with a totally transparent upgrade.
A node can leave a cluster when it shuts down, when the cluster
service is stopped, or when it fails. In a planned shutdown, the
node sends a ClusterExit message to all other members in the
cluster. Since the remaining nodes received the exit message, they
do not need to perform the regroup process. When a node is evicted
unplanned, the node status is changed to evicted.
Cluster Administrator is a graphical administrator's tool that
enables performing maintenance, monitoring, and failover
administration. Additionally, Cluster service includes an automation
interface for creating custom scripting tools for administering
cluster resources, nodes, and the cluster itself.
Cluster service runs in the context of a Windows-based domain
security policy, meaning if the Cluster service does not have access
to a domain controller, it cannot form a cluster. Domain controllers
are replicated externally to the cluster, so the Cluster service
must depend upon the network for accessing the replicas for
authentication. This makes the network become a source of failures.
To work around this, you need to make every node to become its own
authentication authority for the domain. One way is to create a new
domain that encompasses just the cluster itself and exists only to
provide authentication and authorization for the Cluster service and
any other installed services - we call it a domainlet. This
domainlet is small, lightweight, and contains no user accounts and
no global catalog servers.
The domainlet contains the well-known policies and groups defined
for every domain, including Administrators, Domain Administrators,
and the service accounts required by the clusters it supports, and
nothing else. Since every cluster node holds a replica of the
domainlet, a cluster will never generate authentication traffic.
It is very important that you enable logon without a global
catalog by defining the registry key as follows on each domain
controller:
HKLM\SYSTEM\CurrentControlSet\Control\Lsa\IgnoreGCFailures
You should remove the global catalog, if present, from the domain
controllers in the domainlet.
Maintenance
Most maintenance operations within a cluster may be performed
with one or more nodes online without taking the entire cluster
offline.
Service packs may normally be installed on one node at a time and
tested before you move resources to the node, so that if something
goes wrong during the update to one node, the other node is still
untouched and continuing to make resources available. But in any
case, to avoid potential issues or compatibility problems with other
applications, go ahead and check the Microsoft Knowledge Base for
articles that may apply before proceeding.
Adapter replacement may be performed after moving resources and
groups to the other node. Make sure the new adapter configuration
for TCP/IP exactly matches that of the old adapter. If you are
replacing a SCSI adapter and using Y cables with external
termination, you may disconnect the SCSI adapter without affecting
the remaining cluster node. For Shared Disk Subsystem Replacement,
unfortunately, you will most likely have to shut down the cluster.
Note that cluster configuration is not stored on the emergency
repair disk. The service and driver information for the Cluster
Service is stored in the system registry. The configuration for
cluster resources and groups is stored in the cluster registry hive.
Backup the registry to preserve these important settings. For
example, you may use the following command to back up the cluster
registry
regback filename machinecluster
You want to make sure that the following are NOT done to the
cluster
-
create software fault tolerant sets with shared disks as
members.
-
add resources to the cluster group.
-
change computer names of either node.
-
use WINS static entries for cluster nodes or cluster
addresses.
-
configure WINS or default gateway addresses for the private
interconnect.
-
configure cluster resources to use unsupported network
protocols or related network services. IP is the only supported
protocol in a Cluster.
Application Deployment
Cluster-Aware Applications are applications with the following
characteristics
-
uses TCP/IP as a network protocol.
-
maintains data in a configurable location.
-
supports transaction processing.
The two types of cluster-aware applications are applications that
are managed as highly available cluster resources by a custom
resource type, or applications that interact with the cluster but
are not cluster resources. Note that Cluster Administrator itself is
an example of such an application.
On the contrary, a cluster-unaware application is distinguished
by the following features
Note that a cluster-unaware application can be made cluster-aware
by creating resource types to manage the application, as a custom
resource type can provide the initialization, cleanup, and
management routines specific to the needs of the application. If
everything works fine, you are NOT required to make the application
cluster-aware.
Application
Installation
Before proceeding to application installation, you should first
determine the application's resource dependency relationships
between resources in the same resource group. You should know that
all resources that you associate with an application must be in the
same group as the application, meaning if there are multiple
applications or instances sharing a resource, all of them must be in
the same group. And when you want to run the same application on
both servers, you need to define it's resources on both disks. If it
will use any Cluster Group resources, you'll need to add it's
resources to the Cluster Group.
To install a typical application
1. Create a resource group for the application.
2. Bring the group online on one server.
3. Install the application on the first server, and configure
the application to use the cluster storage.
4. Define the application services as cluster resources.
5. Move the group to the other server, install the
application there and configure the application to use the
cluster storage as well.
6. Confirm that the application will fail over. You may try
to manually produce emergency situation to simulate server
shutdown or server failure.
ADDITIONAL READING LISTS
Security for Sharing Resources in
Cluster
Basically, the security consideration for sharing resources in a
Cluster setup is similar to that of in a general environment. It is
always the recommendation that rights should not be granted to a
local group for a directory hosted on the shared drive. Also keep in
mind that the cluster service account requires at least NTFS read
privileges to the directory to properly create the share.
Also note
that in an
active/active cluster configuration, two nodes can own a shared disk
independently of each other.
Reparse Points
The technologies that make use of reparse points include
Directory Junctions, Volume Mount Points (also known as mounted
drive), Removable Storage Service RSS and Remote Installation
Services RIS. What is reparse point? With this you can surpass the
26 drive letter limitation and graft a target folder onto another
NTFS folder, much like mounting a volume onto an NTFS junction
point.
Print Spooling
It used to be very troublesome when setting up NT4 cluster to
host the print spooler. W2K has improvement towards this task. You
can use Cluster Server to create and host print server function.
Disk Replacement for
Cluster
When you want to replace cluster disks, there are usually two
variations:
WINS and DHCP
You may want to cluster WINS and DHCP to guarantee their
availability. Basically you can achieve this in the following
ways:
You can install the server in any of the following ways:
Install Windows 2000 without initially installing the Cluster
service, WINS, or DHCP. Add WINS or DHCP and the Cluster service in
any order later.
OR
Install Windows 2000 with the Cluster service first, then add
DHCP or WINS at a later time.
OR
Install Windows 2000 with WINS/DHCP or both first, and then
install the Cluster service at a later time.
Note that you should install the WINS or DHCP service on each
node in the cluster. You will also need to configure a cluster
resource afterwards.
|