PRODUCT DESCRIPTION H.A. Technical Solutions H.A. software is high availability software designed to manage and control the automatic takeover
of applications running on servers that are networked in a client server environment, servers running special applications in the telecommunications market or applications running on dedicated boards
in the real-time or other operating environments. This is achieved by providing local and/or long distance monitoring of the status of the servers or boards in order to notify key personnel of the
status of the mission critical application. It also allows managers to do maintenance on mission critical applications either onsite or remotely and automatically will failover mission critical
applications to a hot standby server or other running server to prevent downtime. The H.A. software is designed using the latest techniques in high availability technology to prevent false failovers and to
have the fastest detection and response time in the industry. The design of the H.A. software is modular allowing it to be modified to fit the needs of customers with special requirements. The design
also makes H.A. from H.A. Technical Solutions the easiest to install and maintain. Key features of H.A. Technical Solutions H.A. software:
Quick installation – 20 minutes. Does not require additional drivers. Does not modify kernel. Maintains all features of the operating system and application software.
Provides automatic notification of hardware or software problems. GUI management tools. Software structure conducive to customization. Fast failover detection: less than 2 seconds.
Fast failover and restore to operation: 4-120 seconds, including detection time. Client machines believe they are logged into the same server being used at time of failure. Supports all database
applications, Internet, Intranet, any stateless application, telecommunication applications, and can be customized to handle custom OEM applications.
Supported Operating Systems:
Sun OS 4.1.4 or Solaris 1.x Sun Solaris 2.x Sun Solaris for Intel SGI IRIX HP UX 9.x and 10.x Wind River VxWorks Lynx
Operating System Support Under Development:
Available on all models that run supported operating systems.
INSTALLATION Q. Is there a simple way to test an HA system?
A. Yes, H.A. Technical Solutions has provided a simple setup drawing and instructions in the manual
which makes it easy to configure and test the software. By doing this test, one can become familiar with
the operation of the H.A. software without having to configure the application. Once one has learned
and has become comfortable that the H.A. software is working correctly, it can be configured to the application.
Q. What is necessary to create a highly available application service?
A. To create a truly comprehensive high availability environment (which covers system, data, and application service availability) the following components are recommended:
1) High availability storage options. 2) Multi-computer configurations or boards that run a supported operating system. 3) Customizable on-line takeover management software.
4) Logical volume manager with high availability features (optional)
Q. Which of these availability-enhancing products listed above can be obtained from H.A. Technical Solutions, LLC?
A. H.A. Technical Solutions, LLC can provide: 1) Customizable on-line takeover management software: 2) Robust, file system based data services with journalizing:
3) Logical volume manager with high availability features: H.A.T.S. recommends purchase of File System from Programmed Logic H.A.T.S. recommends purchase of Volume Manager from Programmed Logic
High availability hardware options such as disk arrays with redundant power/cooling, un-interruptible power
supplies, dual-ported disk subsystems, etc. would need to be purchased from other vendors of robust data
services which support fault tolerance (the ability to maintain the integrity of mission critical data across application). There are over 160 vendors of array equipment to choose from.
Q. How does HATS H.A. work? How do the nodes work together to insure application up time?
A. The HATS H.A. application, running on a cluster node, maintains a comprehensive view of application service
status. Application-specific information is collected by software-based agents that communicate directly with five
H.A. daemons running on that node. These daemons communicate with other cluster nodes, exchanging status
information. If a failure in a monitored node occurs, HATS H.A. takes the necessary action to recover any
affected application service. An application can either be automatically restarted on a node or failed over to
another node, which impersonates the network address of the failed node. The HATS H.A. Product Overview
has an excellent description of the operational mechanisms which HATS H.A. employs for monitoring and taking action to perform takeovers.
Q. How do I use HATS H.A. to make any application service highly available?
A. To make any application highly available, you need to be able to do three (3) things to it: start it, gracefully stop
it, and test it for its status. HATS H.A. includes the HATS H.A. Agent Template. Using this template users can
build an agent that can automatically start, stop, and test any application service. This agent communicates with the
five daemons the status information of the monitored application service. If the application service fails, HATS
H.A. can either attempt to restart it on the same cluster node (a programmed number of times) or migrate it to another cluster node.
Q. Are any special patches or drivers required for HATS H.A.?
A. Due to its non-intrusive design, HATS H.A. works normally with standard OS configurations without any
modifications. Other high availability products run into problems executing on a wide variety of hardware as there
may be platform specific requirements or changes. HATS H.A. has no such limitations. HATS H.A. also runs on all models of the platform from a supplier, which the competitors do not.
Q. How much storage space does HATS H.A. require?
A. The entire HATS H.A. product consumes 5.9MB of disk space. The HATS H.A. Error Log file is started
over every day and is as large as is required to report the number of problems that have accrued each day. These
can be deleted at the discretion of the customer. Storage space required by our competitors' high availability solutions are greater both for the software and the error logs.
Q. What is the processing overhead associated with HATS H.A.?
A. CPU utilization by the HATS H.A. daemons are almost undetectable by CPU resource monitoring tools. A
test configuration, done at the H.A. Technical Solutions, LLC lab, using two active servers consumed less that
2% of the system's resources. HATS H.A. needs the least amount of resources when compared to other high availability software.
Q. What conditions can cause a takeover to occur?
A.: • Loss of heartbeat • Processing halt or hang • Re-boot • Monitored application failure • Any user defined condition or event that would cause the application to halt
• Power supply or PDU failure • Console keyboard-entered reset • Manually induced takeover/surrender • Any hardware failure causing applications to halt.
• Any network failure that prevents transmission of data or heartbeat.
Q. How does the backup node know that a primary application service has failed?
A. Cluster nodes communicate with each other through a private, redundant interconnect across which heartbeats
are exchanged. When no heartbeat is detectable from a certain node (for a pre-defined number of heartbeats)
and verification has been performed through the public network, a hardware failure is detected. An
application-specific agent detects an application software failure when it is unable to obtain service from the application after a pre-defined number of re-tries.
Q. How long does HATS H.A. require to do a takeover?
A. Takeover time depends on a number of variables, some of which HATS H.A. cannot control. Failure detection
time is configurable at installation, but is typically about two seconds. Disk ownership and network address
migration take on the order of a few more seconds. Application crash recovery is typically what takes the most
time and can be little affected by HATS H.A. parameters. It is not unreasonable to expect recovery times of four
seconds to two minutes for many applications, but use of journalizing, either in a file system or DBMS, can
shorten the recovery time. Our customers tell us that our failover and recovery times are faster than other high
availability products offered - other high availability software has shown response times to take up to two hours.
Q. What is the impact of a failover on network clients?
A. This depends on how the application architecture has been implemented. If the back-end application is using a
stateless connection (e.g. UP based applications such as NFS), the clients will see only a slight application service
delay and will not lose any work in process as a result of the failover. If the client application is using a stateful
connection (e.g. telnet, ftp, DBMS), the clients will see a service disruption and will have to log back into the
recovery node once the takeover is complete. Since failover requires crash recovery of the backend application, any work in progress at the time of failure must be re-submitted by the client.
Q. What percentage of uptime does HATS H.A. provide?
A. It is not unreasonable to achieve 99.9994% uptime with a properly configured highly available application service using HATS H.A. and a disk array.
Q. Will HATS guarantee up time?
A. H.A. Technical Solutions, LLC does not provide a standard guarantee in this area as there are too many determinants of overall application service availability over which HATS has no control.
Q. What determines downtime?
A.. 1) Time to sense a failover. 2) Time to initiate the failover. 3) Time to assure the integrity of the shared disk file system and/or DBMS data.
4) Time to initialize and start up all user applications.
Q. How long does it take to sense a failure?
A. The time to sense a failure is directly dependent upon the sensitivity of the HATS H.A. agents. In most cases this will be under two seconds for critical processes.
Q. Are there any delays when a failover is requested from the main menu?
A. Since H.A. does not need to sense a failure there is no time-loss involved when the command is issued from main menu.
Q. How long does it take to failover the application to the alternate server?
Q. Which items effect the time to do data integrity verification?
A. The shared disk file system consistency check time is a critical factor in the total failover time. This time is
dependent upon the size, type and condition of the file system that has to be checked. Again, a file system with a
journal (e.g. File System from Programmed Logic or NTFS) provides dramatic improvements in this area.
Q. What is the estimated time to run a data consistency check?
A. As an example, a 1GB disk with no journalizing file system has a check time of 20 seconds while a 60GB
RAID disk system with a journalizing file system also has a check time of only 20 seconds.
Q. In a system with a data base, does the transaction monitor in the data base effect or add time to the failover?
A. The transaction log that is played back at restart in order to ensure the database's integrity usually takes no more than a few seconds on database servers.
Q. How long to restart the application?
A. The time to restart is time to reload the application from the shared disk, which is usually a few seconds. There
are some applications that could take much longer. To solve this, HATS H.A. can have the application preloaded on the alternate server which reduces restart time back to a few seconds.
Q. Can you provide a typical example of the total failover time for a system?
A. The following is a typical system configuration and the times involved in a failover. Sun SPARC center 2000, with 36GB shared disk system with file system journalizing and an Informix database.
Time to sense a failover condition is 2 seconds; time to initiate the failover is 2 seconds. Time to assure shared
disk file system integrity is 10 seconds and time to initialize/start-up all user applications is 4 seconds. TOTAL = 18 SECONDS.
Q. How many failover can be expected in a year?
A. From our own experience, and the experience reported by major data base suppliers, one can expect to
experience 43 hours of downtime a year. Typically downtime is an average of 4 hours which means you can expect 10 downtimes a year. See the attached article from Datamation
magazine with its' report from Oracle on downtime.
Q. How much downtime will H.A. from HATS save me?
A. If there are 10 downtimes * 18 seconds = 180 seconds of downtime divided by 60 seconds per minute = 3 minutes of total downtime a year instead of 43 hours. This is 99.9994 % up time.
NOTE: These numbers are based on tests we have run - your number may differ based on differences in your system. Please calculate your exact failover times based on actual conditions in your environment.
HATS H.A. DOES NOT GUARANTEE SYSTEM UPTIME.
Q. What does this relate to in terms of money?
A. The article from Oracle reports the down cost at $1,400 per minute. At 3 minutes a year, that is $4,200 lost
due to downtime. At 43 hours down, it is $3,612,000 lost due to downtime. This was based on a survey of 500 companies that run 24 / 7 operations.
Q. Can I failover between non-identical machines as long as they are running the same operating system?
A. Yes, however, this can create administrative and performance issues. If the systems are not identical, device
names may differ between the two systems which may cause administrative mistakes. Also, differing system
architectures do not always allow for transparent NFS file server failover. See the NFS file server section for
more information about how this can be addressed. HATS H.A. offers the most flexibility of any high availability software solution.
Q. Is HATS H.A. failover bi-directional? Can either system take over the other system's applications and network services while continuing its own network services?
A. Yes. In HATS H.A. this is called a symmetrical configuration. Each system will assume the other's services if it
fails. The take over machine will mount the file system and impersonate the failed server's service network and
TCP/IP address. High availability configuration from other vendors cannot always offer this capability.
Q. Are two heartbeat links recommended?
A. H.A. Technical Solutions software does not require two heartbeat links because it reroutes the heartbeat
through the public network if the private network fails; but HATS recommends having two heartbeat links. Other solutions require having two links.
Q. What types of heartbeat links are supported?
A. HATS H.A. supports TCP/IP over Ethernet links and a serial connection if only two servers are connected.
HATS H.A. also supports any other type of network protocol. To ensure that the H.A. system is reliable and not
subject to a problem of having the heartbeat cause the system to failover due to TCP/IP problems, broken cable
or failed network connection cards, HATS H.A. will reroute the heartbeat packets through the public network.
This assures that a minor failure will not cause a failover. This method has proved to be more reliable than trying to route through the disk array as some other high availability solutions do.
Q. What does the heartbeat consist of?
A. The HATS H.A. heartbeat packets consist of the following information:
1) The name of the server sending the packet, 2) The status of the server sending the packet, and 3) The status of all jobs running on the server sending the packet.
Q. Are the intervals for HATS H.A. heartbeat and agent tests configurable?
Q. Can I use caching for performance in HATS H.A. configurations?
A. If data is cached to a volatile space which is unavailable to the backup node in the event of a primary node
failure (like UNIX buffer caches), then the answer is no. Data in the cache often reflects rights that the primary
application thinks are committed, but in fact have not yet been flushed to disk. If the recovery node cannot access
this information, it cannot determine the most up-to-date state of the data and will resume processing after
recovery with corrupt data. If data is cached to a disk, and this disk is accessible from the backup node after the
recovery is complete, then this does not cause a potential data corruption problem and is safe to use in HATS H.A. configurations.
Non-volatile disk array caches can be used as long as they are available to the backup node during the takeover
process. Use of Sun's SPARC storage array write caches is not recommended, but disk array based caching on
other storage subsystems can be used. Be sure that these caches are non-volatile (battery backed) and available
from the recovery node. The HATS H.A. lets you control how often you write to the disk storage, thus lowering
the chance that there is information in the cache that could be lost if the system goes down. No other high availability solution offers this capability.
Q. How difficult is it to install and configure HATS H.A.?
A. It is very easy with the HATS H.A. easy-to-use documentation. HATS H.A. comes with standard installation
routines that makes installation straight forward. Type 'install HA' and the process is automatic from that point
on. One file needs to be completed on one server, then the information can be copied to each server thereafter
(the config.file needs to be identical on each server in the H.A. system). Any editor can be used to fill out the
config.file. Other vendors may require you to have them install the high availability, and charge you for their time.
Q. How much control does the system administrator have over the events of the failover?
A. The system administrator controls all takeover conditions. HATS H.A. can be configured to failover
automatically, manually, or not at all. To failover manually, the system administrator can change the "states" of the
HATS H.A. servers, and can lock the servers into specific states. All failover conditions may be controlled and
defined by the HATS H.A. user and are not restricted to preset behaviors of the HATS H.A. product.
Q. Does HATS H.A. support shared concurrent access to disk among nodes in a single cluster?
A. No. Shared concurrent access requires a cluster-aware volume manager and application as well as some sort
of distributed lock management capability. With HATS H.A., disks are physically connected to multiple cluster
nodes, but logically defined as being owned by only one node at any one time. Change of disk ownership requires
a HATS H.A. state change like that induced by a takeover or surrender; if the solutions is set up as a hot standby only then can you do this.
Q. In a symmetrical failover environment, System 1 is running applications A, B & C and System 2 is running applications D & E. How does HATS H.A. start and stop service?
A. When System 1 goes down, System 2 starts A, B & C. When System 1 comes back up, System 2 stops A,
B, & C and System 1 starts A, B, & C. When System 2 goes down System 1 starts applications D & E, when
system 2 comes back up System 1 stops D & E and System 2 starts them. At the time of installation, you can choose for this to occur automatically or under manual control.
Q. When the failed cluster node is repaired and brought back on-line, will its former applications services, now running on the backup node, automatically fail back?
A. By default, HATS H.A. locks the takeover server into the failover state (DUAL-SERVICES or TAKEOVER,
depending upon whether you are running a symmetric or asymmetric configuration). When the failed node is
brought back on-line, it requires manual intervention to fail the backup's services back to the primary. The system
administrator may choose to configure HATS H.A. for automatic failback once the primary comes back on-line but it is not recommended. The HATS H.A. documentation describes how to do this.
Q. What is an N-into-1 configuration and how does it work?
A. An N-into-1 configuration involves more that two machines; at least two or more systems provide services (in our example, server-a and server-b
) and an additional one acts as the failover (hot-standby) server. If one or both production server(s) fails, the failover server is capable of impersonating the failed system(s). To setup a
2-into-1 configuration, HATS H.A. software is installed once on server-a and server-b, and twice on the standby server. One copy of HATS H.A. on the hot standby communicates with server-a
; the other copy communicates with server-b. HATS H.A. also allows 3-into-1 and 4-into-1 configurations through a similar mechanism.
N-into-1 configurations are currently supported for all systems supported by HATS H.A.. N-into-1 configurations are not available through all vendors.
Q. How many interfaces are required for the private heartbeat links in a 2-into-1 configuration?
A. In all HATS H.A. N-into-1 configurations, each server will use the same single private network interfaces for the heartbeat links with backup through the public network.
Q. How does failover occur into a 2-into-1 configuration?
A. As defined in the earlier example, when System A fails, the interface A' on the standby server will impersonate
interface A. Likewise, interface B' will impersonate B if System B fails. The standby system can impersonate both server-a and server-b
simultaneously if they both happen to fail at the same time. The 2-into-1 configuration is not widely supported by all vendors of high availability solutions.
Q. Can the hot-standby system provide other services, or does it have to strictly be an idle server waiting for a failover condition?
A. A hot-standby system does not have to be idle; it can provide services that may be discontinued when a failover occurs. This is an asymmetrical
configuration. The standby server may also provide a full set of applications, which could be failed over to the "primary" server; this is a symmetrical configuration. Having the
capability to run applications on the hot-standby is not standard in most high availability software.
Q. Is it possible to configure 2-into-1 symmetrical failover where C is the failover server for both A and B, but C itself has to be highly available, so A and B are used to back up C?
A. This is possible, although it increases the complexity of system administration. To maintain the ease-of-use
qualities of HATS H.A., we recommend using only asymmetrical configurations in N-into-1 environments. There
is also a potential problem: if C fails, A and B will back it up. If A then fails, no server will be there to take over for it.
Q. On a failover, does the hostname get configured or de-configured along with the IP address and MAC address?
A. Most of our customers choose not to failover the hostname of the machine, since a majority of the applications
are not dependent on the machine's hostname. Under most circumstances a machine has the name as its hostname and its primary interface name. For example, our primary server HATS H.A.-a
has an IP address of 10.3.198.201. When you ping HATS H.A.-a, you are effectively pinging the IP address 10.3.198.201. On our second server, the system hostname might be HATS H.A.-b
. If HATS H.A.-b is a standby server (as it would be in an asymmetric configuration), it does not need to be connected to the public network at startup. When HATS H.A.-a fails,
HATS H.A.-b will go into takeover mode. It will impersonate the IP address of HATS H.A.-a, 10.3.198.201. When takeover completes you will have a system with hostname HATS H.A.-b
, but its interface name will correspond to HATS H.A.-a.
Q. Is there any advantage to swapping the IP and MAC addresses between the primary and secondary workstation on failover?
A. Yes. Using the example above, by impersonating IP and MAC addresses in a failover, users can connect to 10.3.198.201 even though the HATS H.A.-a
server is down. This is especially important if HATS H.A.-a is providing NFS services, since you don't need to un-mount NFS file systems on the client machines and re-mount
them every time a failover occurs.
Q. Is it possible to use the standby systems for normal-computing tasks unrelated to the failover configuration?
A. Yes, the secondary server does not necessarily have to be idle while it is waiting for the primary to fail, it can be doing something useful.
Q. What happens if a failover is initiated from the primary to the secondary and the secondary is unreachable?
A. If you want to issue failover from the primary server, which can be done by the system administrator using H.A. mainmenu
or the graphical interface, you can change the server state from ONLINE_PRIMARY to IDLE. But since the secondary is unreachable (i.e. no communications from secondary via the heartbeats) the primary
will not be able to transfer services to the secondary.
Q. Is there a limit to the number of services that can be monitored?
A. Yes. Up to 16 jobs can run simultaneously, whereas some other high availability software only allows one application.
Q. How do I determine the software limitations and configuration requirements for HATS H.A.?
A. All software limitations are listed in the manual, including the requirement for dual heartbeat links, multiple
interfaces, etc. HATS H.A. has fewer limitations than any other high availability vendors.
Q. Is there a difference between the primary and secondary heartbeat?
A. Both heartbeats are crucial in your H.A. configuration since information is exchanged across both heartbeats.
Dual heartbeats give us redundancy and high availability since we still can have communication between servers
even if one heartbeat is down. It also means that a single NIC failure in the heartbeat link will not result in a cluster failure, although the event is logged.
Q. Can I use my regular LAN Ethernet connection (the public network) for the secondary heartbeat?
A. Yes, as described previously. Not all high availability solutions allow this method – the alternatives can cause slower response time and are sometimes less reliable.
NETWORKS Q. During failover, how are the IP and hardware Ethernet addresses of the primary system assumed by the secondary system?
A. During failover, the secondary machine will execute a HATS H.A. transition program, the TAKEOVER
program, to impersonate the IP and MAC (Ethernet) addresses of the primary machine. Standby interfaces remain idle otherwise.
Q. How many network interfaces do I need to support a HATS H.A. configuration?
A. At a minimum, an asymmetric will require 1 network interface, however, this is a minimal configuration. 2-network interfaces are preferred.
Q. Can static routing be used?
A. You can use static routing by adding route entries into the route table manually, or you can set a default route
to a gateway machine which contains routing information to other hosts or networks.
Q. How are ARP tables on client nodes effected by the takeover of the failed server's IP and MAC addresses?
A. The client node's ARP table should remain the same, since the takeover server will assume the IP and Ethernet
addresses during takeover; the takeover interface's address values will be consistent with ARP tables on all clients.
Q. Are MAC layer addresses required to be unique worldwide?
A. No, MAC layer addresses are only required to be unique in their Ethernet segment.
Q. What happens if two cards with the same MAC address appear on a network?
A. The effect of this occurrence on an Ethernet is unpredictable. Packets may be routed to the wrong host and be
lost. On FDDI and Token Ring networks this is more serious since it will lock up the entire ring, and may require that all nodes on the ring be rebooted.
Q. Does HATS H.A. support failovers on FDDI and Token Ring networks?
A. HATS H.A. supports the FDDI and Token Ring networks the same as it supports standard Ethernet networks.
Q. How can MAC addresses be duplicated on a ring, if one of the servers is not in service?
A. When a node is halted, there is still power coming into the Token Ring interface. In this state, the interface card
retains its MAC address and still responds to all network requests for that MAC address. It does not, however,
pass on any information to the driver. When this node is halted and is part of an H.A. setup, the standby node will
detect that the primary node is down, due to loss of heartbeat, and issue a failover. Part of the normal failover
procedure is to impersonate the IP and MAC address of the primary, therefore both the primary and standby
servers will be responding to all network activity for their common MAC address. Ethernet handles this situation in a graceful manner while the Token Ring networks simply shut down.
Q. How does H.A. Technical Solutions, LLC recommend resolving potential Token Ring network issues to enable failover?
A. A possible solution for resolving this issue is to assign each interface card a unique MAC. The failover script
should be modified to impersonate only the IP address, not the MAC address. Depending upon the
implementation, the ARP table on all nodes on the ring may have to be updated. If this is required, this solutions may only be feasible for small networks.
Another possible solution for the Token Ring issue is to install an intelligent Token Ring hub. When a failover is
imminent, the standby server connects to the hub command interface and disables the port connected to the
primary server. This allows the standby server to impersonate the MAC address in an environment that prevents
duplicate MAC addresses from appearing on a Token Ring network. This is the preferable configuration.
To summarize, HATS H.A. can be made to support failover on Token Ring networks provided that the
configuration addresses the protocols' sensitivity to duplicate MAC addresses on the ring and other hub-specific
issues. In all cases where Token Ring failover is required, the configuration is a consulting special.
Q. Is there a requirement to use the on-board Sun le0 Ethernet interface as the primary heartbeat link?
A. No, but this is shown because it is more cost-effective to use le0 instead of buying a separate controller, and
because it saves precious network connection slots. As long as the two heartbeat links run into network
interfaces that fail independently (i.e. a single board failure will not bring them both down) that meets the
requirement. Please note, however, that in asymmetric configurations the recovery node (standby) must use le0
for the heartbeat network since le0 must always be functional in a Sun server. This is not the case with other platforms.
Q. Is network failover supported on ATM networks?
A. Not as a part of the standard HATS H.A. product, but LAN emulation across ATM networks can be delivered as a consulting special.
DISK ISSUES Q. How is the location and ownership of the devices and file systems arranged between two systems in a normal and in a failover mode?
A. The devices and file systems should be identical on both systems for the same job (i.e., the file system mount point, device name and major and minor numbers should be the same).
Q. Does HATS H.A. require any additional software or device drivers to work with third party disk drives?
A. HATS H.A. uses standard operating system utilities to configure devices, resulting in no requirements for
additional software, drivers or kernel modifications. As long as the devices follow standard OS device naming
conventions and are transparent to file system utilities (such as mount) they are usable to HATS H.A.. Normally,
disks shared between the HATS H.A. servers and used with HATS H.A. must be connected directly to both
HATS H.A. servers. To work correctly in such a dual-connected mode, the system's SCSI drivers and hardware
have to support "multi-initiator" capabilities. This feature allows multiple systems to simultaneously issue
instructions to a common SCSI or other peripherals bus. For SCSI-based disk subsystems, users also need to
change the SCSI initiator ID on one of the servers to a value other than 7, the default. Most modern Unix disk drivers support both multi-initiator and programmable SCSI initiator ID capabilities.
Q. Are CLARiiON disk arrays supported in HATS H.A. configurations?
A. The CLARiiON disk subsystems have a storage processor architecture that does not require you to connect
all disks directly to both servers. In CLARiiON-based clusters, both systems are not actually on the same SCSI
bus. The systems access the SCSI disks through a storage processor in the CLARiiON box. The CLARiiON
storage processors transfer disk control to the backup system via the CLARiiON "trespass" facility, which is
called from the HATS H.A., scripts when using HATS H.A. with CLARiiON disk arrays. Dual-bus configurations (one cluster node attached to each storage processor) are supported but split-bus configurations
(both nodes attached to both storage processors simultaneously) are not. CLARiiON also provides an agent,
which monitors the status of the CLARiiON and makes this information available to HATS H.A..
Q. Why do I need to change the SCSI initiator ID on one of the servers?
A. SCSI initiator ID's are like network addresses; each device needs one that is unique. SCSI cards normally
have the default SCSI initiator-ID value of 7. If both servers have SCSI initiator-ID of 7 and are connected to
the same (shared) devices, the SCSI bus will become confused. Therefore, to set up shared disks on a single
SCSI chain, you need to change the SCSI initiator-ID on one of the systems. On Sun systems you can changed this default by using the eeprom
command, and you only need to change it on one controller card. On other systems, all shared SCSI controllers may have to be changed independently.
Q. How do I know what SCSI initiator-ID value to use?
A. The SCSI initiator-ID value has to be unique on the SCSI bus. Usually it is set to 5 for the second system on
regular SCSI interfaces because many systems don't have a device set to target 5. The following is a sample SCSI device map:
0-Disk 1, 1-Disk 2, 2-Disk 3, 3-Disk 4, 4-Tape 1, 5-Second controller, 6-CD-ROM, 7-First controller
Q. How does the secondary system access the file systems owned by the primary system after a failover?
A. The file systems typically reside on the shared disks on a SCSI bus that is connected to both systems. By
configuring the HATS H.A. transition scripts, you can define the file systems to be mounted by the failover
system. When the primary system fails, the secondary system will invoke the transition script to make the same file
systems available. If you use a logical volume manager (such as the SPARC storage Array Volume Manager or
the Volume Manager from Programmed Logic), which is highly recommended for better overall availability, you will need to import the disk groups before mounting and NFS-sharing the file systems.
Q. Can Sun's Solstice DiskSuite (SDS) product be used on HATS H.A.-enabled Sun servers?
A. Yes, but the use of Solstice DiskSuite will impose limitations in performance, number of supported disk
groups, and ability to use a high performance journalizing file system. Use of a DiskSuite release prior to 4.0 limits
you to asymmetric configurations only, as it supports only one disk group per physical host. Use of DiskSuite is a
supported configuration, but use of the Volume Manager from Programmed Logic is recommended since it suffers from none of these limitations.
Q. What kind of RAID array problems will cause failover to occur?
A. Most RAID array problems are solved internally to the RAID subsystem, and are not noticed or monitored by
any outside systems. HATS H.A. runs as an application and is independent of the kernel and all drivers.
Therefore, as it is shipped, it does not detect kernel-specific or driver-specific failures (like a RAID disk driver
problem). However, most RAID vendors provide utilities for checking the status of the array and detect array
failures. The user can write a simple HATS H.A. agent to monitor the status of their RAID array. In this HATS
H.A. agent, the user would define the conditions for RAID failure. Note that all conditions do not necessarily
result in a failover condition. The user defines the exact failure conditions and remedies. Many users have their
agents inform them of critical conditions and then manually failover the system as required. Some RAID vendors, like CLARiiON, provide their own HATS H.A. agents to monitor their storage array status.
Q. In normal operation, can both the primary and secondary systems read and write to the RAID array?
A. Yes, but not simultaneously to the same disk or disk group. You can partition your storage into two or more
groups of disks; the primary system can access one disk group while the secondary owns the other. Shared
concurrent access to a disk group is not supported – a takeover must occur to move disk group ownership form one node to another.
Q. In failover, what happens on the secondary system to allow it to use the RAID array?
A. This depends on the RAID vendor. If the RAID is multi-initiated, the secondary system can simply mount the
devices in the RAID array. If it is dual-ported or has a split bus, you may need to run utility programs provided by the vendor to migrate the devices from one host's view to the other.
If you use the SPARC storage, Volume Manager from Programmed Logic, or Solstice DiskSuite, you will need to import/reserve the disk groups/disk sets before mounting and NFS-sharing the volumes.
NETWORK FILE SYSTEM (NFS) SERVER ISSUES Q. Can I failover between non-identical NFS servers?
A. Yes. However, the NFS file handle presented for each file must be the same on each server to prevent "stale
NFS file handle" errors. This automatically corrects for HP UX and AIX servers. However, in Sun SPARC
system environments, the file handle includes the major and minor device numbers of the file system's underlying
device on the server. Different Sun architecture families (sun4c, sun4e, etc.) use different major numbers for the
same devices. Add-on product drivers may differ as well. The SPARC storage or Volume Manager from
Programmed Logic present a simplified method for getting major and minor numbers to be consistent on both
systems; this could otherwise only be achieved with manual effort and scrupulously identical system configurations.
Q. What is the impact of a takeover on an NFS client?
A. The NFS clients will see an "NFS server x down" message when the primary server is down and the
secondary server is in transition to failover. Once failover has completed, the clients will get an "NFS server x
OK" message. Solaris clients re-connect quickly to the recovery node, but HP UX and AIX clients may take
longer to perform the reconnect. In all cases, because NFS uses stateless connections between the clients and the server, clients will see only a slight service delay rather than a disruption.
Q. In normal operation can both systems actively provide NFS services using the devices in the shared disks?
A. In a symmetric configuration, provided that Node A and Node B are accessing separately owned disk groups,
both nodes can share access to the shared disks. Shared concurrent access by both nodes to a single disk group
is not supported. In asymmetric configurations normally only one server is accessing the shared disks at a time.
Load balancing across cluster nodes is not supported in either symmetric or asymmetric configurations.
Q. How is the primary system's NFS service established on the secondary system during failover?
A. The secondary system starts up the NFS daemons if necessary, then mounts and NFS shares the shared disks
which have been identified by the user in the TAKEOVER.SCRIPT. These disks are then available to the NFS clients.
Q. Does HATS H.A. work well with NFS for PC and Macintosh NFS clients?
A. Yes, HATS H.A. enables transparent NFS failover for all NFS clients.
Q. How safe are client NFS packets during failover? Would data be lost if it was received and acknowledged
by the server but the system crashed before it was cleanly written to disk? If the data is received but the system crashes even before acknowledgement, would the data be re-transmitted and re-written?
A. HATS H.A. has no way of detecting these two conditions. The first condition could happen if the system had
caching (including Prestoserve) enabled, or was using asynchronous NFS. These should not be used in H.A. configurations.
If these conditions are serious issues for you, you may want to investigate the purchase of a journalizing file
system, like the File System for your NFS servers which could reduce – but not eliminate – the likelihood of these failures.
Q. What happens to packets that are sent to the primary system during failover before the secondary system
has assumed the primary's NFS workload? Will they time-out and be resent by the client NFS? When failover is complete, will the secondary system respond to the packets?
A. In most cases the primary will go down before it can send an acknowledgement to the client, so the client will
continue to resend the packet as the secondary machine takes over. Once the takeover is complete, the packet will arrive at the secondary machine and will be serviced and acknowledged.
Q. How long does NFS failover take?
A. The length of time for the NFS failure is dependent upon the number of disks and the amount of data and
service that you have. HATS H.A. has to check the consistency of all shared file systems and shut down services
before failing over. On our test systems, with 400MB of data and no journalizing file system, it takes
approximately 4 seconds to failover. The longest process in the failover is the check of the disk systems. If using
a journalizing file system, the check can take as little as a couple of seconds. When we measured failover using a
600GB NPI RAID subsystem with 18 disks in the array and file system journalizing, the entire failover took only 20 seconds.
Q. What happens during normal operation, failover operation, and after failover operation in an NFS configuration?
A. Assume a configuration with two servers and a shared disk array; server-a and server-b are in a symmetrical
configuration. The array contains drives with the device identifiers clt0d0s0 through clt3d0s0 and c2t0d0s0 through c2t3d0s0. During normal operation: • server-a
mounts clt0d0s0-clt3d0s0 as/export/{clt0,clt1,clt2,clt3} • server-b mounts c2t0d0s0-c2t3d0s0 as/export/{c2t0,c2t1,c2t2,c2t3}
During failover of server-a: • if manually initiated, server-a disables NFS sharing and unmounts/export/{clt0,clt1,clt2,clt3}
• if automatically initiated by server-a failure or reboot, it will not be possible to disable sharing and unmount the disks. After failover: • server-b
mounts and NFS-shares clt0d0s0-clt3d0s0 as/export/{clt0,clt1,clt2,clt3} • server-b will then control two sets of disks – c1 and c2.
Q. Do all applications failover as transparently as NFS?
A. All applications which use stateless connections between client and server, and can gracefully retry or recover
in the event of a back end service failure should experience only slight delays rather than service disruption as a
result of a takeover. If applications use a stateful connection or will not automatically attempt to re-connect to a
service that appears to be unavailable, then clients will experience a service disruption and typically will need to
log back into the back end service. HATS H.A. does have a special version available that can failover software applications that are specifically registered to a single host ID.
Q. Do I have to utilize the NFS failover capabilities of HATS H.A.?
A. No. Most capabilities of HATS H.A. are user-configurable, including transparent NFS failovers and Ethernet
impersonation. All features are controlled by the user and by the config.ha file in HATS H.A..
DATABASE ISSUES Q. How do I failover an Oracle, Sybase, SQL server or Informix database?
A. When a server fails or the agent requests a failover, the database will fail over to the takeover server. To
support DBMS failover, servers are generally configured so that shared disks contain the database data and
software. These disks should have the same mount points (if used for file systems) and device names on both
machines. When a failover occurs, the service is restarted on the secondary machine, using the database data and
executables on the shared disk. H.A. Technical Solutions, LLC can provide agents for monitoring each of these databases on UNIX servers.
Q. How can we configure HATS H.A. to reduce data loss for database data in memory upon CPU failure?
A. HATS H.A. does not monitor data in memory. You should attempt to configure the database to leave as little
data as possible in buffers, which are not yet written to disk. For instance, Sybase uses a daemon that monitors
the pages in memory and occasionally writes them to disk. You can set the time to a small value so that writing to the disk is done frequently.
Q. Which releases of common databases have been used with HATS H.A.?
A. HATS H.A. customers have successfully failed over Oracle 7.x.x and 8.x.x, Sybase 4.9, 10 and 11, and
Informix OnLine 7 DBMS servers in the past. Please contact your reseller for more information and reference sites.
Q. What does a database client see when a failover occurs?
A. This is dependent on whether the client/server database environment is implemented as stateful or stateless.
Database systems like Oracle, Sybase and Informix tend to be stateful, since client programs must connect and
authenticate to the server. If a database application were implemented as stateless, then when failover occurred it
would appear to the end user much like the failover of NFS to another server (i.e., there would be a pause of a
few seconds to a few minutes until the new server completely failed the services over, then processing would
continue without user intervention). However, database applications are usually written as one long-lived session
to a server. This session-orientation introduces 'statefulness' into database application usage. Therefore when a
normal database server fails, the end user application dies. From a users' point of view, this forces a re-launching
of the application. Although this helps databases perform better, it is not an optimum design for high availability.
Database applications can be written to appear stateless to the user. Each autonomous set of requests to the SQL
server should be constructed as a single transaction, and should be checked for successful completion. If the
request does not complete successfully due to server failure, the application would try to re-connect to the server.
Once the re-connection is complete, the application would re-transmit the SQL requests. From a user's
perspective, the transaction would experience slow response, but would successfully complete. There is additional
session management and checking overhead for the programmer in this approach, but it is more user-friendly in
the event of a failure. Unfortunately, many database applications written to date do not implement this strategy,
and modifying an existing application to use this strategy is not normally an easy chore. Database APIs currently
have no hooks to make this a simple process, although some APIs are now providing automatic reconnection hooks.
Q. What happens if the server crashes when a request is being processed?
A. All modern, industrial-strength databases use journals or logs to record all database activity. The transaction
log is written prior to related database and table and index updates and marks an activity as complete once the
entire database table and index records have been written. When a database server process crashes and then gets
re-started (either on the original node or the recovery node), the database server "rolls forward" transactions in
the log starting from the last known consistent state, to ensure the completion of transactions. Entries for
incomplete transactions are not applied to the database. This process leaves the database in a consistent state.
Q. Are all related database operations saved in the database transaction log?
A. Not necessarily. The programmer must define transactions properly, including all related database operations, for proper post-crash consistency. For example:
User sends "Sold a baseball card" transaction:
1) Send transaction 'start software'. These disks should have the same mount points (if used for file
systems) and device names on both machines. When a failover occurs, the service is restarted on the secondary machine, using the database data and executables on the shared disk.
2) Delete: YEAR=1960 CARD=Mays, Willie APPRAISED_VALUE=$150 From BaseballCardInventory 3) Add: YEAR=1960 CARD=Mays, Willie SELLING_PRICE=$170 to BaseballCardRevenue 4) Send transaction.
If the database crashes at any time before the completion of step 4, the takeover server will "roll over" (undo) all
of the steps. Depending on the database server and application program, the client could receive a message
saying that the transaction was not completed and direct the client to re-submit this request.
It is prudent for a high availability implement to ask database programmers the following questions (database programmers sometimes overlook crash and recovery issues):
• Are all database transactions properly constructed for correctness across a recovery? (for example, a "send
transaction complete" message appearing after step 2 above would be inappropriate.)
• Do client applications handle broken connections? They do not have to reconnect, but they must report to the user that there was a broken connection and that their transaction did not complete.
H.A. Technical Solutions, LLC can provide agents for monitoring each of these databases on UNIX servers, and has developed an Oracle and Sybase agent.
DATA INTEGRITY ISSUES Q. What happens if power to the disk is turned off while a write is in progress?
A. For a database, this would be handled as part of the transaction log based recovery. For a file system, the
check program will run on power-up and reboot. If the sector being written contains structural (metadata)
information, the check program will try to repair it. If the sector cannot be repaired the file or directory will be
removed. (the likelihood can often be reduced by use of a journalizing file system. If the interrupted write was a
data block, the check program will probably be able to detect the incomplete write, and data corruption may
result. File systems are at a disadvantage to databases with regard to data integrity, since databases generally log all data transactions.
Q. What should I do if my disk crashes?
A. If you have a disk configuration with no redundancy (mirroring or parity), you must replace the drive and
restore the data from backup and incremental logs. The system or applications affected will be down until the
disk is fully restored. Data loss is possible if backups are not up-to-date or have errors. If you have a RAID subsystem (implemented with RAID controller hardware or with software that simulates
RAID) with redundant storage and hot-swap and/or hot spare capability, you can merely swap out the bad disk
and, if necessary, cause the data to be rebuilt onto the replacement drive. No loss of data or access should result.
|