Oracle® Clusterware Administration and Deployment Guide 11g Release 2 (11.2) Part Number E10717-04 |
|
|
View PDF |
When an application, process, or server fails in a cluster, you want the disruption to be as short as possible, if not completely unknown to users. For example, when an application fails on a server, that application can be restarted on another server in the cluster, minimizing or negating any disruption in the use of that application. Similarly, if a server in a cluster fails, then all of the applications and processes running on that server must be able to fail over to another server to continue providing service to the users. Using customizable action scripts and application agent programs, as well as resource attributes that you assign to applications and processes, Oracle Clusterware can manage all these entities to ensure high availability.
This chapter explains how to use Oracle Clusterware to start, stop, monitor, restart, and relocate applications. Oracle Clusterware is the underlying cluster solution for Oracle Real Application Clusters (Oracle RAC). The same functionality and principles you use to manage Oracle RAC databases are applied to the management of applications.
This chapter includes the following topics:
This section discusses the framework that Oracle Clusterware uses to monitor and manage resources, in order to ensure high application availability.
This section includes the following topics:
Oracle Clusterware manages applications and processes as resources that you register with Oracle Clusterware. The number of resources you register with Oracle Clusterware to manage an application depends on the application. Applications that consist of only one process are usually represented by only one resource. More complex applications, built on multiple processes or components, may require multiple resources.
When you register an application as a resource in Oracle Clusterware, you define how Oracle Clusterware manages the application using resource attributes you ascribe to the resource. The frequency with which the resource is checked and the number of attempts to restart a resource on the same server after a failure before attempting to start it on another server (failover) are examples of resource attributes. The registration information also includes a path to an action script or application-specific action program that Oracle Clusterware calls to start, stop, check, and clean up the application.
An action script is a shell script (a batch script in Windows) that a generic script agent provided by Oracle Clusterware calls. An application-specific agent is usually a C or C++ program that calls Oracle Clusterware-provided APIs directly.
See Also:
Appendix B, "Oracle Clusterware Resource Reference" for an example of an action scriptOracle Clusterware 11g release 2 (11.2) uses agent programs (agents) to manage resources and includes the following built-in agents so that you can use scripts to protect an application:
scriptagent
: Use this agent (scriptagent.exe
in Windows) to use shell or batch scripts to protect an application. Both the cluster_resource
and local_resource
resource types are configured to use this agent, and any resources of these types automatically take advantage of this agent.
appagent
: This agent (appagent.exe
in Windows) automatically protects any resources of the application
resource type used in previous versions of Oracle Clusterware. You are not required to configure anything to take advantage of the agent. It invokes action scripts in the manner done with previous versions of Oracle Clusterware and should only be used for the application
resource type.
Note:
Oracle recommends that you usescriptagent
for all resources of types other than application
. Oracle provides the application agent for backward compatibility with the deprecated application
type.By default, all resources not of the application
resource type, use the script agent, unless you override this behavior by creating a new resource type and specifying a different agent as part of the resource type specification (using the AGENT_FILENAME
attribute).
See Also:
"Resource Type" for more information about resource typesAdditionally, you can create your own agents to manage your resources in any manner you want.
See Also:
"Building an Agent" for more information about building custom agentsGenerally, all resources are unique but some resources may have common attributes. Oracle Clusterware uses resource types to organize these similar resources. Benefits that resource types provide are:
Manage only necessary resource attributes
Manage all resources based on the resource type
Every resource that you register in Oracle Clusterware must have a certain resource type. In addition to the resource types included in Oracle Clusterware, you can define custom resource types using the Oracle Clusterware Control (CRSCTL) utility. The included resource types are:
Local resource: Instances of local resources—type name is local_resource
—run on each server of the cluster. When a server joins the cluster, Oracle Clusterware automatically extends local resources to have instances tied to the new server. When a server leaves the cluster, Oracle Clusterware automatically sheds the instances of local resources that ran on the departing server. Instances of local resources are pinned to their servers; they do not fail over from one server to another.
Cluster resource: Cluster-aware resource types—type name is cluster_resource
—are aware of the cluster environment and are subject to cardinality and cross-server switchover and failover.
See Also:
Appendix E, "CRSCTL Utility Reference" for more information about using CRSCTL to add resource types
"Resource State" for more information about resource state
Note:
Previous versions of Oracle Clusterware only supported the application resource type. This resource type still exists, but only for backward compatibility. Oracle recommends that you register application-type resources as cluster resource-type resources in Oracle Clusterware 11g release 2 (11.2). If you decide not to register your application-type resources as cluster resource-type resources, consult the documentation that corresponds to those applications for administration information.Oracle Clusterware manages applications when they are registered as a resources with Oracle Clusterware, and Oracle Clusterware has access to application-specific primitives that have the ability to start, stop, and monitor a specific resource. Oracle Clusterware runs all resource-specific commands through an entity called an agent.
An agent is a process that contains the agent framework and user code to manage resources. The agent framework is a library that enables you to plug in your application-specific code to manage customized applications. You program all of the actual application management functions, such as starting, stopping and checking the health of an application, into the agent. These functions are referred to as entry points.
The agent framework is responsible for invoking these entry point functions on behalf of Oracle Clusterware. Agent developers can use these entry points to plug in the required functionality for a specific resource, with respect to how to start, stop, and monitor a resource. Agents are capable of managing multiple resources.
Agent developers can set the following entry points as callbacks to their code:
START: The START entry point acts to bring a resource online. The agent framework calls this entry point whenever it receives the start command from Oracle Clusterware.
STOP: The STOP entry points acts to gracefully bring down a resource. The agent framework calls this entry point whenever it receives the stop command from Oracle Clusterware.
CHECK: The CHECK (monitor) entry point acts to monitor the health of a resource. The agent framework periodically calls this entry point. If it notices any state change during this action, then the agent framework notifies Oracle Clusterware about the change in the state of the specific resource.
CLEAN: The CLEAN entry point acts whenever there is a need to clean up a resource. It is a non-graceful operation that is invoked when users must forcefully terminate a resource. This command cleans up the resource-specific environment so that the resource can be restarted.
ABORT: If any of the other entry points hang, the agent framework calls the ABORT entry point to abort the ongoing action. If the agent developer does not supply an abort function, then the agent framework exits the agent program.
START, STOP, CHECK, and CLEAN are mandatory entry points and the agent developer must provide these entry points when building an agent. Agent developers have several options to implement these entry points, including using C, C++, or scripts. It is also possible to develop agents that use both C or C++ and script-type entry points. When initializing the agent framework, if any of the mandatory entry points are not provided, then the agent framework invokes a script pointed to by the ACTION_SCRIPT
resource attribute.
See Also:
"ACTION_SCRIPT" for information about this resource attributeAt any given time, the agent framework invokes only one entry point per application. If that entry point hangs, then the agent framework calls the ABORT entry point to abort the current operation. The agent framework periodically invokes the CHECK entry point to determine the state of the resource. This entry point must return one of the following states as the resource state:
CLSAGFW_ONLINE: The CHECK entry point returns ONLINE if the resource was brought up successfully and is currently in a functioning state. The agent framework continues to monitor the resource when it is in this state.
CLSAGFW_UNPLANNED_OFFLINE/CLSAGFW_PLANNED_OFFLINE: The OFFLINE state indicates that the resource is not currently running. Two distinct categories exist to describe an resource's offline state: planned and unplanned.
When the state of the resource transitions to OFFLINE through Oracle Clusterware, then it is assumed that the intent for this resource is to be offline (TARGET=OFFLINE
), regardless of which value is returned from the CHECK entry point. However, when an agent detects that the state of a resource has changed independent of Oracle Clusterware (such as somebody stopping the resource through a non-Oracle interface), then the intent must be carried over from the agent to the Cluster Ready Services daemon (crsd
). The intent then becomes the determining factor for the following:
Whether to keep or to change the value of the resource's TARGET
resource attribute. PLANNED_OFFLINE indicates that the TARGET
resource attribute must be changed to OFFLINE only if the resource was running before. If the resource was not running (STATE=OFFLINE
, TARGET=OFFLINE
) and a request comes in to start it, then the value of the TARGET
resource attribute changes to ONLINE
. The start request then goes to the agent and the agent reports back to Oracle Clusterware a PLANNED_OFFLINE resource state, and the value of the TARGET
resource attribute remains ONLINE
. UNPLANNED_OFFLINE does not change the TARGET
attribute.
Whether to leave the resource's state as UNPLANNED_OFFLINE or attempt to recover the resource by restarting it locally or failing it over to a another server in the cluster. The PLANNED_OFFLINE state makes crsd
leave the resource as is, whereas the UNPLANNED_OFFLINE state prompts resource recovery.
CLSAGFW_UNKNOWN: The CHECK entry point returns UNKNOWN if the current state of the resource cannot be determined. In response to this state, Oracle Clusterware does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource if the previous state of the resource was either ONLINE or PARTIAL.
CLSAGFW_PARTIAL: The CHECK entry point returns PARTIAL when it knows that a resource is partially ONLINE and some of its services are available. Oracle Clusterware considers this state as partially ONLINE and does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource in this state.
CLSAGFW_FAILED: The CHECK entry point returns FAILED whenever it detects that a resource is not in a functioning state and some of its components have failed and some clean up is required to restart the resource. In response to this state, Oracle Clusterware calls the CLEAN action to clean up the resource. After the CLEAN action finishes, the state of the resource is expected to be OFFLINE. Next, depending on the policy of the resource, Oracle Clusterware may attempt to failover or restart the resource. Under no circumstances does the agent framework monitor failed resources.
The agent framework implicitly monitors resources in the states listed in Table 5-1 at regular intervals, as specified by the CHECK_INTERVAL
or OFFLINE_CHECK_INTERVAL
resource attributes.
See Also:
"CHECK_INTERVAL" and "OFFLINE_CHECK_INTERVAL" for more information about these resource attributesTable 5-1 Agent Framework Monitoring Characteristics
State | Condition | Frequency |
---|---|---|
ONLINE |
Always |
|
PARTIAL |
Always |
|
OFFLINE |
Only if the value of the |
|
UNKNOWN |
Only monitored if the resource was previously being monitored as a result of any one of the previously mentioned conditions. |
Whatever the value of either the |
Whenever an agent starts, the state of all the resources it monitors is set to UNKNOWN. After receiving an initial probe request from Oracle Clusterware, the agent framework executes the CHECK entry point for all of the resources to determine their current states.
Once the CHECK action successfully completes for a resource, the state of the resource transitions to one of the previously mentioned states. The agent framework then starts resources based on commands issued from Oracle Clusterware. After the completion of every action, the agent framework invokes the CHECK action to determine the current resource state. If the resource is in one of the monitored states listed in Table 5-1, then the agent framework periodically executes the CHECK entry point to check for changes in resource state.
By default, the agent framework does not monitor resources that are offline. However, if the value of the OFFLINE_CHECK_INTERVAL
attribute is greater than 0, then the agent framework monitors offline resources.
Building an agent for a specific application involves the following steps:
Implement the agent framework entry points either in scripts, C, or C++.
Build the agent executable (for C and C++ agents).
Collect all the parameters needed by the entry points and define a new resource type. Set the AGENT_FILENAME
attribute to the absolute path of the newly built executable.
See Also:
Example B-3 for an example of an action script for an agentRegister resources in Oracle Clusterware 11g release 2 (11.2) using the crsctl add resource
command.
Note:
TheCRS_REGISTER
and CRS_PROFILE
commands are still available in the Oracle Clusterware home but are deprecated for this release.To register an application as a resource:
$ crsctl add resource resource_name -type resource_type [-file file_path] | [-attr "attribute_name='attribute_value', attribute_name='attribute_value', ..."]
Choose a name for the resource based on the application for which it is being created. For example, if you create a resource for an Apache Web server, then you might name the resource myApache
.
The name of the resource type follows the -type
option. You can specify resource attributes in either a text file following the -file
option or in a comma-delimited list of resource attribute-value pairs enclosed in double quotation marks (""
) following the -attr
option. You must enclose space or comma-delimited attribute values and values enclosed in parentheses in single quotation marks (''
).
Following is an example of an attribute file:
PLACEMENT=favored HOSTING_MEMBERS=node1 node2 node3 RESTART_ATTEMPTS@CARDINALITYID(1)=0 RESTART_ATTEMPTS@CARDINALITYID(2)=0 FAILURE_THRESHOLD@CARDINALITYID(1)=2 FAILURE_THRESHOLD@CARDINALITYID(2)=4 FAILURE_INTERVAL@CARDINALITYID(1)=300 FAILURE_INTERVAL@CARDINALITYID(2)=500 CHECK_INTERVAL=2 CARDINALITY=2
See Also:
"Adding User-defined Resources" for examples of using the crsctl add resource
command
"crsctl add resource" for more information about using the crsctl add resource
command
Oracle Clusterware manages resources based on how you configure them to increase their availability. You can configure your resources so that Oracle Clusterware:
Starts resources during cluster or server start
Restarts resources when failures occur
Relocates resources to other servers, if the servers are available
To manage your applications with Oracle Clusterware:
Create an action script or use an existing agent.
Register your applications as resources with Oracle Clusterware.
If a single application requires that you register multiple resources, you may be required to define relevant dependencies between the resources.
Assign the appropriate privileges to the resource.
Start or stop your resources.
When a resource fails, Oracle Clusterware attempts to restart the resource based on attribute values that you provide when you register an application or process as a resource. If a server in a cluster fails, then you can configure your resources so that processes that were assigned to run on the failed server restart on another server. Based on various resource attributes, Oracle Clusterware supports a variety of configurable scenarios.
When you register a resource in Oracle Clusterware, the relevant information about the application and the resource-relevant information, is registered in the Oracle Cluster Registry (OCR). This information includes:
Path to the action script or application-specific agent: The absolute path to the script or application-specific agent that defines the start, stop, and check actions that Oracle Clusterware performs on the application. An additional action is included in Oracle Clusterware 11g release 2 (11.2): clean.
See Also:
"Agents" for more information about these actionsPrivileges: Oracle Clusterware has the necessary privileges to control all of the components of your application for high availability operations, including the right to start processes that are owned by other user identities. Oracle Clusterware must run as a privileged user to control applications with the correct start and stop processes.
Resource Dependencies: Relationships among resources that imply an operational ordering or affect the placement of resources on servers in the cluster. For example, Oracle Clusterware can only start a resource that has a hard start dependency on another resource if the other resource is running. Oracle Clusterware prevents stopping a resource if other resources that depend on it are running. However, you can force a resource to stop using the crsctl stop resource -f
command, which first stops all resources that depend on the resource being stopped.
This section includes the following topics:
Resource attributes define how Oracle Clusterware manages resources of a specific resource type. Each resource type has a unique set of attributes. Some resource attributes are specified when you register resources, while others are internally managed by Oracle Clusterware.
See Also:
Appendix B, "Oracle Clusterware Resource Reference" for complete details of resource attributesEvery resource in a cluster is in a particular state at any time. Certain actions or events can cause that state to change.
Table 5-2 lists and describes the possible resource states.
Table 5-2 Possible Resource States
State | Description |
---|---|
ONLINE |
The resource is running. |
OFFLINE |
The resource is not running. |
UNKNOWN |
An attempt to stop the resource has failed. Oracle Clusterware does not actively monitor resources that are in this state. You must perform an application-specific action to ensure that the resource is offline, such as stop a process, and then run the |
INTERMEDIATE |
A resource can be in the INTERMEDIATE state because of one of two events:
Oracle Clusterware actively monitors resources that are in the INTERMEDIATE state and, typically, you are not required to intervene. If the resource is in the INTERMEDIATE state due to the preceding reason 1, then as soon as the state of the resource is established, Oracle Clusterware transitions the resource out of the INTERMEDIATE state. If the resource is in the INTERMEDIATE state due to the preceding reason 2, then it stays in this state if it remains partially online. For example, the home server of the VIP must rejoin the cluster so the VIP can switch over to it. A database administrator must issue a command to open the database instance. In either case, however, Oracle Clusterware transitions the resource out of the INTERMEDIATE state automatically as soon as it is appropriate.Use the |
You can configure resources to be dependent on other resources, so that the dependent resources can only start or stop when certain conditions of the resources on which they depend are met. For example, when Oracle Clusterware attempts to start a resource, it is necessary for any resources on which the initial resource depends to be running and in the same location. If Oracle Clusterware cannot bring the resources online, then the initial (dependent) resource cannot be brought online, either. If Oracle Clusterware stops a resource or a resource fails, then any dependent resource is also stopped.
Some resources require more time to start than others. Some resources must start whenever a server starts, while other resources require a manual start action. These and many other examples of resource-specific behavior imply that each resource must be described in terms of how it is expected to behave and how it relates to other resources (resource dependencies).
Previous versions of Oracle Clusterware included only two dependency specifications: the REQUIRED_RESOURCES
resource attribute and the OPTIONAL_RESOURCES
resource attribute. The REQUIRED_RESOURCES
resource attribute applied to both start and stop resource dependencies.
Note:
TheREQUIRED_RESOURCES
and OPTIONAL_RESOURCES
resource attributes are still available only for resources of application
type. Their use to define resource dependencies in Oracle Clusterware 11g release 2 (11.2) is deprecated.In Oracle Clusterware 11g release 2 (11.2), however, resource dependencies are separated into start and stop categories. This separation improves and expands the start and stop dependencies between resources and resource types.
This section includes the following topics:
Oracle Clusterware considers start dependencies contained in the profile of a resource when the start effort evaluation for that resource begins. You specify start dependencies for resources using the START_DEPENDENCIES
resource attribute. You can use modifiers on each dependency to further configure the dependency.
See Also:
"START_DEPENDENCIES" for more information about the resource attribute, modifiers, and usageThis section includes descriptions of the following START dependencies:
hard
Define a hard
start dependency for a resource if another resource must be running before the dependent resource can start. For example, if resource A has a hard
start dependency on resource B, then resource B must be running before resource A can start. By default, resources A and B must be located on the same server (co-located).
Note:
Oracle recommends that resources withhard
start dependencies also have pullup
start dependencies.You can configure the hard
start dependency with the following constraints:
START_DEPENDENCIES=hard(global:resourceB)
Use the global
modifier to specify that resources need not be co-located. For example, if resource A has a hard(global:resourceB)
start dependency on resource B, then, if resource B is running on any node in the cluster, resource A can start.
START_DEPENDENCIES=hard(intermediate:resourceB)
Use the intermediate
modifier to specify that the dependent resource can start if a resource on which it depends is in either the ONLINE
or INTERMEDIATE
state.
START_DEPENDENCIES=hard(type:resourceB.type)
Use the type
modifier to specify whether the hard
start dependency acts on a particular resource or a resource type. For example, if you specify that resource A has a hard
start dependency on the resourceB.type
type, then if any resource of the resourceB.type
type is running, resource A can start.
START_DEPENDENCIES=hard(resourceB, intermediate:resourceC, intermediate:global:type:resourceC.type)
You can combine modifiers and specify multiple resources in the START_DEPENDENCIES
resource attribute.
Note:
Separate modifier clauses with commas. Thetype
modifier clause must always be the last modifier clause in the list and the type
modifier must always directly precede the type.weak
If resource A has a weak
start dependency on resource B, then an attempt to start resource A attempts to start resource B, if resource B is not running. The result of the attempt to start resource B is, however, of no consequence to the result of starting resource A. By default, resources A and B must be co-located.
You can configure the weak
start dependency with the following constraints:
START_DEPENDENCIES=weak(global:resourceB)
Use the global
modifier to specify that resources need not be co-located. For example, if resource A has a weak(global:resourceB)
start dependency on resource B, then, if resource B is running on any node in the cluster, resource A can start.
START_DEPENDENCIES=weak(concurrent:resourceB)
Use the concurrent
modifier to specify that resource A and resource B can start concurrently, instead of waiting for resource B to start, first.
START_DEPENDENCIES=weak(type:resourceB.type)
Use the type
modifier to specify that the dependency acts on a resource of a particular resource type, such as resourceB.type
.
attraction
If resource A has an attraction
dependency on resource B, then Oracle Clusterware prefers to place resource A on servers hosting resource B. Dependent resources, such as resource A in this case, are more likely to run on servers on which resources to which they have attraction
dependencies are running. Oracle Clusterware places dependent resources on servers with resources to which they are attracted.
You can configure the attraction
start dependency with the following constraints:
START_DEPENDENCIES=attraction(intermediate:resourceB)
Use the intermediate
modifier to specify whether the resource is attracted to resources that are in the INTERMEDIATE
state.
START_DEPENDENCIES=attraction(type:resourceB.type)
Use the type
modifier to specify whether the dependency acts on a particular resource type. The dependent resource is attracted to the server hosting the greatest number of resources of a particular type.
Note:
Previous versions of Oracle Clusterware used the now deprecatedOPTIONAL_RESOURCES
attribute to express attraction dependency.pullup
Use the pullup
start dependency if resource A must automatically start whenever resource B starts. This dependency only affects resource A if it is not running. As is the case for other dependencies, pullup
may cause the dependent resource to start on any server. Use the pullup
dependency whenever there is a hard stop dependency, so that if resource A depends on resource B and resource B fails and then recovers, then resource A is restarted.
Note:
Oracle recommends that resources withhard
start dependencies also have pullup
start dependencies.You can configure the pullup
start dependency with the following constraints:
START_DEPENDENCIES=pullup(intermediate:resourceB)
Use the intermediate
modifier to specify whether resource B can be either in the ONLINE
or INTERMEDIATE
state to start resource A.
If resource A has a pullup
dependency on multiple resources, then resource A starts only when all resources upon which it depends, start.
START_DEPENDENCIES=pullup:always(resourceB)
Use the always
modifier to specify whether Oracle Clusterware starts resource A despite the value of the TARGET
attribute of the resource on which resource A depends, whether the value of the TARGET
attribute is ONLINE
or OFFLINE
. By default, pullup
only starts resources if the value of the TARGET
attribute of the resource on which they depend is ONLINE
.
START_DEPENDENCIES=pullup(type:resourceB.type)
Use the type
modifier to specify that the dependency acts on a particular resource type.
dispersion
If you specify the dispersion
start dependency for a resource, then Oracle Clusterware starts this resource on a server that has the fewest number of resources to which this resource has dispersion. Resources with dispersion may still end up running on the same server if there are not enough servers to disperse them to.
You can configure the dispersion
start dependency with the following modifiers:
START_DEPENDENCIES=dipersion(intermedite:resourceB)
Use the intermediate
modifier to specify that Oracle Clusterware disperses resource A whether resource B is either in the ONLINE
or INTERMEDIATE
state.
START_DEPENDENCIES=dipersion:active(resourceB)
Typically, dispersion is only applied when starting resources. If at the time of starting, resources that disperse each other start on the same server (because there are not enough servers at the time the resources start), then Oracle Clusterware leaves the resources alone once they are running, even when more servers join the cluster. However, Oracle Clusterware reapplies dispersion on resources later when new servers join the cluster, if you specify the active
modifier.
Oracle Clusterware considers stop dependencies between resources whenever a resource is stopped (the resource state changes from ONLINE
to any other state).
hard
If resource A has a hard
stop dependency on resource B, then resource A must be stopped when B stops running. The two resources may attempt to start or relocate to another server, depending upon how they are configured. Oracle recommends that resources with hard
stop dependencies also have hard
start dependencies.
The following constraints for a hard
stop dependency:
You can configure the hard
stop dependency with the following modifiers:
STOP_DEPENDENCIES=hard(intermedite:resourceB)
Use the intermediate
modifier to specify whether resource B must either be in the ONLINE
or INTERMEDIATE
state for resource A to stay online.
STOP_DEPENDENCIES=hard(global:resourceB)
Use the global
modifier to specify whether resource A requires that resource B be present on the same server or on any server in the cluster to remain online. If this constraint is not specified, then resources A and B must be running on the same server. Oracle Clusterware stops resource A when that condition is no longer met.
STOP_DEPENDENCIES=hard(shutdown:resourceB)
Use the shutdown
modifier to stop the resource only when you shut down the Oracle Clusterware stack using either the crsctl stop crs
or crsctl stop cluster
commands.
See Also:
"STOP_DEPENDENCIES" for more information about modifiersAffect of Resource Dependencies on Resource State Recovery
When a resource goes from a running to a non-running state, while the intent to have it running remains unchanged, this transition is called a resource failure. At this point, Oracle Clusterware applies a resource state recovery procedure that may try to restart the resource locally, relocate it to another server, or just stop the dependent resources, depending on the high availability policy for resources and the state of entities at the time.
When two or more resources depend on each other, a failure of one of them may end up causing the other to fail, as well. In most cases, it is difficult to control or even predict the order in which these failures are detected. For example, even if resource A depends on resource B, Oracle Clusterware may detect the failure of resource B after the failure of resource A.
This lack of failure order predictability may lead to Oracle Clusterware attempting to restart dependent resources in parallel, which, ultimately, leads to the failure to restart some resources, because the resources upon which they depend are being restarted out of order.
In this case, Oracle Clusterware reattempts to restart the dependent resources locally if either or both the hard
stop and pullup
dependencies are used. For example, if resource A has either a hard
stop dependency or pullup
dependency, or both on resource B, and resource A fails because resource B failed, then Oracle Clusterware may end up trying to restart both resources at the same time. If the attempt to restart resource A fails, then as soon as resource B successfully restarts, Oracle Clusterware reattempts to restart resource A.
As part of the start effort evaluation, the first decision that Oracle Clusterware must make is where to start (or place) the resource. Making such a decision is easy when the caller specifies the target server by name. If a target server is not specified, however, then Oracle Clusterware attempts to locate the best possible server for placement given the resource's configuration and the current state of the cluster.
Oracle Clusterware considers a resource's placement policy first and filters out servers that do not fit with that policy. Oracle Clusterware sorts the remaining servers in a particular order depending on the value of the PLACEMENT
resource attribute of the resource.
See Also:
"Application Placement Policies" for more information about thePLACEMENT
resource attributeThe result of this consideration is a maximum of two lists of candidate servers on which Oracle Clusterware can start the resource. One list contains preferred servers and the other contains possible servers. The list of preferred servers will be empty if the value of the PLACEMENT
resource attribute for the resource is set to balanced
or restricted
. The placement policy of the resource implies on which server the resource wants to run. Oracle Clusterware considers preferred servers over possible servers, if there are servers in the preferred list.
Oracle Clusterware then considers the resource's dependencies to determine where to place the resource, if any exist. The attraction
and dispersion
start dependencies affect the resource placement decision, as do some of the dependency modifiers. Oracle Clusterware applies these placement hints to further order the servers in the two previously mentioned lists. Note that Oracle Clusterware processes each list of servers independently, so that the effect of the resource's placement policy is not confused by that of dependencies.
Finally, Oracle Clusterware chooses the first server from the list of preferred servers, if any servers are listed. If there are no servers on the list of preferred servers, then Oracle Clusterware chooses the first server from the list of possible servers, if any servers are listed. When no servers exist in either list, Oracle Clusterware generates a resource placement error.
Note:
Neither the placement policies nor the dependencies of the resources related to the resource Oracle Clusterware is attempting to start affect the placement decision.This section presents examples of the procedures for registering an application as a resource in Oracle Clusterware. The procedures instruct you how to add an Apache Web server as a resource to Oracle Clusterware.
Assume for the examples in this section that the Oracle Clusterware administrator has full administrative privileges over Oracle Clusterware and the user or group that owns the application that Oracle Clusterware is going to manage. Once the registration process is complete, Oracle Clusterware can start any application on behalf of any operating system user.
Oracle Clusterware distinguishes between an owner of a registered resource and a user. The owner of a resource is the operating system user under which the agent runs. The ACL
resource attribute of the resource defines permissions for the users and the owner. Only root
can modify any resource.
See Also:
"Role-separated Management"Notes:
Oracle Clusterware commands prefixed with crs_
are deprecated with this release. CRSCTL commands replace those commands. See Appendix E, "CRSCTL Utility Reference" for a list of CRSCTL commands and their corresponding crs_
commands.
Do not use CRSCTL commands on any resources that have names prefixed with ora
(because these are Oracle resources), unless Oracle Support directs you to do so.
You can, however, create resources that depend on Oracle resources. When creating resources, do not use an ora prefix in the resource name. This prefix is reserved for Oracle use only.
To configure Oracle resources, use the server control utility, SRVCTL, which provides you with all configurable options.
This section includes the following topics:
If clients of an application access the application through a network, then you must register a virtual internet protocol address (VIP) on which the application depends. An application VIP is a cluster resource that Oracle Clusterware manages (Oracle Clusterware provides a standard VIP agent for application VIPs). You should base any new application VIPs on this VIP type to ensure that your system experiences consistent behavior among all of the VIPs that you deploy in your cluster.
While you can add a VIP, as you can add any other resource that Oracle Clusterware manages, Oracle recommends using the script Grid_home
/bin/appvipcfg
to create or delete an application VIP.
The usage of this script is as follows:
appvipcfg create -network=network_number -ip=ip_address -vipname=vip_name -user=user_name [-group=group_name appvipcfg delete -vipname=vip_name]
Where network_number
is the number of the network, ip_address
is the IP address, vip_name
is the name of the VIP, and user_name
is the name of the user.
For example, as root
, run the following command:
# Grid_home/bin/appvipcfg create -network=1 -ip=148.87.58.196 -vipname=appsVIP -user=root
The script only requires a network number (default is 1), the IP address, and a name for the VIP resource, as well as the user that owns the application VIP resource. A VIP resource is typically owned by root
because VIP related operations require root privileges.
To delete an application VIP, use the same script with the -delete
option. This option accepts the VIP name as a parameter.
After you have created the application VIP using this configuration script, you can view the VIP profile using the following command:
Grid_home/bin/crsctl status res VIPname -p
Verify and, if required, modify the following parameters using the Grid_home
/bin/crsctl modify res
command.
See Also:
Appendix B, "Oracle Clusterware Resource Reference" for detailed information about using CRSCTL commandsThe appvipcfg
script assumes that the default ora.vip
network resource (ora.net1.network
) is used as the default. In addition, it is also assumed that a default app.appvip.type
is used for those purposes.
To create a type for the new application VIP, run following command as root or the Oracle Clusterware installation owner:
# crsctl add type app.appvip.type -basetype ora.cluster_vip_net1.type
Once you create an application VIP type, you can add the actual VIP resource to the cluster. To register a VIP as a resource with Oracle Clusterware, run the following command as root or the Oracle Clusterware installation owner:
# crsctl add resource appsvip -type app.appvip.type -attr "RESTART_ATTEMPTS=2, START_TIMEOUT=100,STOP_TIMEOUT=100,CHECK_INTERVAL=10,USR_ORA_VIP=192.168.10.10, START_DEPENDENCIES='hard(ora.net1.network) pullup(ora.net1.network)', STOP_DEPENDENCIES='hard(ora.net1.network)'"
See Also:
Appendix B, "Oracle Clusterware Resource Reference" for detailed information about using CRSCTL commandsIn the preceding example, the VIP is defined as follows:
RESTART_ATTEMPTS=2
: Oracle Clusterware tries to restart the resource twice before failing it over to another server.
START_TIMEOUT=100
: CRSD waits 100 seconds for the VIP to start before CRSD stops and reports an error.
STOP_TIMEOUT=100
: CRSD waits 100 seconds for the VIP to stop.
CHECK_INTERVAL=10
: Oracle Clusterware checks the resource every 10 seconds to determine its status.
USR_ORA_VIP=192.168.10.10
: The IP address that the resource uses. You can use an IP address that resolves through DNS.
START_DEPENDENCIES='hard(ora.net1.network) pullup(ora.net1.network)'
: The resource has hard and pull-up START dependencies on the network resource type.
STOP_DEPENDENCIES='hard(ora.net1.network)'
: The resource has a hard STOP dependency on the network resource type.
Note:
Because the resource is based on theora.cluster_vip.type
resource type, no VIP-specific user script is necessary. Oracle Clusterware uses the same agent that ora.cluster_vip.type
uses.On Linux and UNIX operating systems, an application VIP must run as root
. Unless you added the VIP resource as root
, you must ensure that the VIP can run as root
. To ensure that the VIP can run as root
:
Log in as root
and run the following command:
# crsctl setperm resource appsVIP –o root
Run the following command to give the Oracle Database installation owner permission to start the VIP:
# crsctl setperm resource appsVIP –u user:oracle:r-x
As the Oracle Database installation owner, start the VIP resource:
$ crsctl start resource appsVIP
Adding an Application VIP with Oracle Enterprise Manager
To add an application VIP with Oracle Enterprise Manager:
Log into Oracle Enterprise Manager Database Control.
Click the Cluster tab.
Click Administration.
Click Manage Resources.
Enter a cluster administrator user name and password to display the Manage Resources page.
Click Add Application VIP.
Enter a name for the VIP in the Name field.
Enter a network number in the Network Number field.
Enter an IP address for the VIP in the Internet Protocol Address field.
Enter root in the Primary User field. Oracle Enterprise Manager defaults to whatever user name you are logged in as.
Select Start the resource after creation if you want the VIP to start immediately.
Click Continue to display the Confirmation: Add VIP Resource page.
Enter root and the root password as the cluster credentials.
Click Continue to create the application VIP.
You can add resources to Oracle Clusterware at any time. However, if you add a resource that is dependent on another resource, then you must first add the resource upon which it is dependent.
In the example in this section, assume that an action script, myApache.scr
, resides in the /opt/cluster/scripts
directory on each node to facilitate adding the resource to the cluster. You must decide whether to use administrator or policy management for the application.
Use administrator management for smaller, two-node configurations, where your cluster configuration is not likely to change. Use policy management for more dynamic configurations when your cluster consists of more than two nodes. For example, if a resource only runs on node 1 and node 2, for example, because only those nodes have the necessary files, then administrator management is probably more appropriate.
Note:
Oracle recommends that you use shared storage, such as Oracle Automatic Storage Management Cluster File System, to store action scripts to decrease script maintenance.Oracle Clusterware supports the deployment of applications in access-controlled server pools made up of anonymous servers and strictly based on the desired pool size. Cluster administrator-defined policies can and must be used in this case to govern the server assignment with desired sizes and levels of importance. Alternatively, a strict or preferred server assignment can be used, in which resources run on specifically named servers. This represents the pre-existing model available in earlier releases of Oracle Clusterware.
Conceptually, a cluster hosting applications developed and deployed in both of the schemes can be viewed as two logically separated collections of servers. One server is used for server pools, enabling role separation and server capacity control. The other server assumes a fixed assignment based on named servers in the cluster.
A built-in server pool named "Generic" always owns the servers used by applications of the latter scheme. The Generic server pool is a logical division and can be used to separate the two parts of the cluster using different management schemes.
For third party developers to use the model to deploy applications, server pools must be used. To take advantage of the pre-existing application development and deployment model based on named servers, sub-pools of Generic (server pools that have Generic as their parent pool, defined by the server pool attribute PARENT_POOLS
) must be used. By creating sub-pools that use Generic as their parent and enumerating servers by name in the sub-pool definitions, applications ensure that named servers are in Generic and are used exclusively for applications using the named servers model.
To manage an application using either deployment scheme, you must create a server pool before adding the resource to the cluster. In the following example, it is assumed that a server pool has been created to host an application. This server pool is not a sub-pool of Generic, but instead it is used to host the application in a top-level server pool.
This section includes the following topics:
To add the Apache Web server to a top- level server pool as a resource using the policy based deployment scheme, run the following command as the user that is supposed to run the Apache Server. For an Apache Server this is typically the root
user:
$ crsctl add resource myApache -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr, PLACEMENT=restricted, SERVER_POOLS=server_pool_list,CHECK_INTERVAL=30,RESTART_ATTEMPTS=2, START_DEPENDENCIES=hard(appsvip),STOP_DEPENDENCIES=hard(appsvip)"
In the preceding example, myApache is the name of the resource added to the cluster.
Note:
A resource name cannot begin with a period or with the character string "ora."Notice that attribute values are enclosed in single quotation marks (' '). Configure the resource as follows:
The resource is a cluster_resource type.
ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr:
The path to the required action script.
PLACEMENT=restricted
SERVER_POOLS=sp1 sp3
: This resource can only run in the server pools specified in this space-separated list.
CHECK_INTERVAL=30
: Oracle Clusterware checks this resource every 30 seconds to determine its status.
RESTART_ATTEMPTS=2
: Oracle Clusterware attempts to restart this resource twice before failing it over to another node.
START_DEPENDENCIES=hard(appsvip)
: This resource has a hard START dependency on the appsvip resource. The appsvip resource must be online in order for myApache to start.
STOP_DEPENDENCIES=hard(appsvip)
: This resource has a hard STOP dependency on the appsvip resource. The myApache resource stops if the appsvip resource goes offline.
To add the Apache Web server as a resource that uses a named server deployment, it is assumed that the resource is added to a server pool that is by definition a sub-pool of the Generic server pool. Server pools that represent sub-pools of Generic are created using the crsctl add serverpool
command. These server pools define the Generic server pool as their parent in the server pool attribute PARENT_POOLS
. In addition, they include a list of server names in the SERVER_NAMES
parameter to specify the servers that should be assigned to the respective pool. For example:
$ crsctl add serverpool myApache_sp -attr "PARENT_POOLS=Generic, SERVER_NAMES=stado36 stado37"
Once this sub-pool has been created, you can add the resource, as in the previous example:
$ crsctl add resource myApache -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr, PLACEMENT='restricted', SERVER_POOLS=myApache_sp, CHECK_INTERVAL='30', RESTART_ATTEMPTS='2', START_DEPENDENCIES='hard(appsvip)', STOP_DEPENDENCIES='hard(appsvip)'"
Note:
A resource name cannot begin with a period or with the character string "ora."In addition, note that the server pools listed in the SERVER_POOLS
resource parameter, must be sub-pools under Generic. These sub-pools are then typically assigned to run on certain, named servers.
To add resources to Oracle Clusterware using Oracle Enterprise Manager:
Log into Oracle Enterprise Manager Database Control.
Click the Cluster tab.
Click Administration.
Click Add Resource.
Enter a cluster administrator user name and password to display the Add Resource page.
Enter a name for the resource in the Name field.
Note:
A resource name cannot begin with a period nor with ora.Choose either cluster_resource or local_resource from the Resource Type drop down.
Optionally, enter a description of the resource in the Description field.
Select Start the resource after creation if you want the resource to start immediately.
The optional parameters in the Placement section define where in a cluster Oracle Clusterware places the resource.
See Also:
"Application Placement Policies" for more information about placementThe attributes in this section correspond to the attributes described in Appendix B, "Oracle Clusterware Resource Reference"
In the Action Program section, choose from the Action Program drop down whether Oracle Clusterware calls an action script, an agent file, or both to manage the resource.
You must also specify a path to the script, file, or both, depending on what you select from the drop down.
If you choose Action Script, then you can click Create New Action Script to use the Oracle Enterprise Manager action script template to create an action script for your resource, if you have not yet done so.
To further configure the resource, click Attributes. On this page, you can configure start, stop, and status attributes, and offline monitoring and any attributes that you define.
Click Advanced Settings to enable more detailed resource attribute configurations.
Click Dependencies to configure start and stop dependencies between resources.
See Also:
"Resource Dependencies" for more information about dependenciesClick Submit when you finish configuring the resource.
Oracle Clusterware manages resources based on the permissions of the user who added the resource. The user who first added the resource owns the resource and the resource runs as the resource owner. Certain resources must be managed as root
. If a user other than root
adds a resource that must be run as root
, then the permissions must be changed as root
so that root
manages the resource, as follows:
# crsctl setperm resource resource_name –o root
And as the user who installed Oracle Clusterware, enable the Oracle Database installation owner (oracle
, in the following example) to run the script:
$ crsctl setperm resource resource_name –u user:oracle:r-x
Start the resource:
$ crsctl start resource resource_name
A resource can be started on any server, subject to the placement policies, the resource start dependencies, and the availability of the action script on that server.
The PLACEMENT
resource attribute determines how Oracle Clusterware selects a server on which to start a resource and where to relocate the resource after a server failure. The HOSTING_MEMBERS
and SERVER_POOLS
attributes determine eligible servers to host a resource and the PLACEMENT
attribute further refines the placement of resources.
See Also:
Appendix B, "Oracle Clusterware Resource Reference" for more information about theHOSTING_MEMBERS
and SERVER_POOLS
resource attributesThe value of the PLACEMENT
resource attribute determines how Oracle Clusterware places resources when they are added to the cluster or when a server fails. Together with either the HOSTING_MEMBERS
or SERVER_POOLS
attributes, you can configure how Oracle Clusterware places the resources in a cluster. When the value of the PLACEMENT
attribute is:
balanced
: Oracle Clusterware uses any online server for placement. Less loaded servers are preferred to servers with greater loads. To measure how loaded a server is, Oracle Clusterware uses the LOAD
resource attribute of the resources that are in an ONLINE
state on the server. Oracle Clusterware uses the sum total of the LOAD
values to measure the current server load.
favored
: If values are assigned to either the SERVER_POOLS
or HOSTING_MEMBERS
resource attribute, then Oracle Clusterware considers servers belonging to the member list in either attribute first. If no servers are available, then Oracle Clusterware places the resource on any other available server. If there are values for both the SERVER_POOLS
and HOSTING_MEMBERS
attributes, then SERVER_POOLS
indicates preference and HOSTING_MEMBERS
restricts the choices to the servers within that preference.
restricted
: Oracle Clusterware only considers servers that belong to server pools listed in the SEVER_POOLS
resource attribute or servers listed in the HOSTING_MEMBERS
resource attribute for resource placement. Only one of these resource attributes can have a value, otherwise it results in an error.
See Also:
"SERVER_POOLS" for more informationTo unregister a resource, use the crsctl delete resource
command. You cannot unregister an application or resource that is ONLINE or required by another resource, unless you use the -force
option. The following example unregisters the Apache Web server application:
$ crsctl delete resource myApache
Run the crsctl delete resource
command as a clean-up step when a resource is no longer managed by Oracle Clusterware. Oracle recommends that you unregister any unnecessary resources.
This section includes the following topics:
Each application that you manage with Oracle Clusterware is stored as a resource in OCR. Use the crsctl add resource
command to register applications in OCR. For example, enter the following command to register the Apache Web server application from the previous example:
$ crsctl add resource myApache -type cluster_resource
-attr "ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr, PLACEMENT=restricted,
SERVER_POOLS=server_pool_list,CHECK_INTERVAL=30,RESTART_ATTEMPTS=2,
START_DEPENDENCIES=hard(appsvip),STOP_DEPENDENCIES=hard(appsvip)"
If you modify a resource, then update OCR by running the crsctl modify resource
command.
To start an application resource that is registered with Oracle Clusterware, use the crsctl start resource
command. For example:
$ crsctl start resource myApache
See Also:
Appendix E, "CRSCTL Utility Reference" for usage information and examples of CRSCTL command outputThe command waits to receive a notification of success or failure from the action program each time the action program is called. Oracle Clusterware can start application resources if they have stopped due to exceeding their failure threshold values. You must register a resource using crsctl add resource
before you can start it.
Start and stop resources with the crsctl start resource
and crsctl stop resource
commands. Manual starts or stops outside of Oracle Clusterware can invalidate the resource status. In addition, Oracle Clusterware may attempt to restart a resource on which you perform a manual stop operation.
Running the crsctl start resource
command on a resource sets the resource TARGET
value to ONLINE
. Oracle Clusterware attempts to change the state to match the TARGET
by running the action program with the start
action.
If a cluster server fails while you are starting a resource on that server, then check the state of the resource on the cluster by using the crsctl status resource
command.
Use the crsctl relocate resource
command to relocate applications and application resources. For example, to relocate the Apache Web server application to a server named rac2
, run the following command:
# crsctl relocate resource myApache -n rac2
Each time that the action program is called, the crsctl relocate resource
command waits for the duration specified by the value of the SCRIPT_TIMEOUT
resource attribute to receive notification of success or failure from the action program. A relocation attempt fails if:
The application has required resources that run on the initial server
Applications that require the specified resource run on the initial server
To relocate an application and its required resources, use the -f
option with the crsctl relocate resource
command. Oracle Clusterware relocates or starts all resources that are required by the application regardless of their state.
Stop application resources with the crsctl stop resource
command. The command sets the resource TARGET
value to OFFLINE
. Because Oracle Clusterware always attempts to match the state of a resource to its target, the Oracle Clusterware subsystem stops the application. The following example stops the Apache Web server:
# crsctl stop resource myApache
You cannot stop a resource if another resource has a hard stop dependency on it, unless you use the force (-f
) option. If you use the crsctl stop resource
resource_name
-f
command on a resource upon which other resources depend, and if those resources are running, then Oracle Clusterware stops the resource and all of the resources that depend on the resource you are stopping that are running.
To display status information about applications and resources that are on cluster servers, use the crsctl status resource
command. The following example displays the status information for the Apache Web server application:
# crsctl status resource myApache NAME=myApache TYPE=cluster_resource TARGET=ONLINE STATE=ONLINE on server010
Enter the following command to view information about all applications and resources in tabular format:
# crsctl status resource
Append a resource name to the preceding command to determine:
How many times the resource has been restarted
How many times the resource has failed within the failure interval
The maximum number of times that a resource can restart or fail
The target state of the resource and the normal status information
Use the -f
option with the crsctl status resource
resource_name
command to view full information of a specific resource.
See Also:
Appendix E, "CRSCTL Utility Reference" for detailed information about CRSCTL commandsYou can prevent Oracle Clusterware from automatically restarting a resource by setting several resource attributes. You can also control how Oracle Clusterware manages the restart counters for your resources. In addition, you can customize the timeout values for the start
, stop
, and check
actions that Oracle Clusterware performs on resources.
This section includes the following topics:
When a server restarts, Oracle Clusterware attempts to start the resources that run on the server as soon as the server starts. Resource startup might fail, however, if system components on which a resource depends, such as a volume manager or a file system, are not running. This is especially true if Oracle Clusterware does not manage the system components on which a resource depends. To manage automatic restarts, use the AUTO_START
resource attribute to specify whether Oracle Clusterware should automatically start a resource when a server restarts.
Note:
Regardless of the value of theAUTO_START
resource attribute for a resource, the resource can start if another resource has a hard or weak start dependency on it or if the resource has a pullup start dependency on another resource.See Also:
"Start Dependencies" for more information
Appendix B, "Oracle Clusterware Resource Reference" for more information about the AUTO_START
resource attribute
When a resource fails, Oracle Clusterware attempts to restart the resource the number of times specified in the RESTART_ATTEMPTS
resource attribute, regardless of how often the resource fails. The crsd
process maintains an internal counter to track how often Oracle Clusterware restarts a resource. The number of times Oracle Clusterware has attempted to restart a resource is reflected in the RESTART_COUNT
resource attribute. Oracle Clusterware can automatically manage the restart attempts counter based on the stability of a resource. The UPTIME_THRESHOLD
resource attribute determines the time period that a resource must remain online, after which the RESTART_COUNT
attribute gets reset to 0. In addition, the RESTART_COUNT
resource attribute gets reset to 0 if the resource is relocated or restarted by the user, or the resource fails over to another server.