Scroll to navigation

ocf_suse_SAPHanaController(7) OCF resource agents ocf_suse_SAPHanaController(7)

NAME

SAPHanaController - Manages takeover between two SAP HANA databases with system replication (scale-up).

SYNOPSIS

SAPHanaController [start | stop | status | monitor | promote | demote | meta-data | validate-all | methods | usage ]

DESCRIPTION

SAPHanaController is an resource agent (RA) for SAP HANA databases. It manages takeover for a SAP HANA database with system replication in an OCF promotable clone configuration. This manual page explains SAP HANA scale-up scenarios. For scale-out, see SAPHanaController-scale-out(7).

System replication will help to replicate the database data from one computer to another computer in order to compensate for database failures. With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the resource agent must work together. The SAPHanaController RA performs the actual check of the SAP HANA database instances and is configured as promotable clone resource. Managing the two SAP HANA instances means that the resource agent controls the start/stop of the instances. In addition the resource agent is able to monitor the SAP HANA databases on landscape host configuration level. For this monitoring the resource agent relies on interfaces provided by SAP.

A third task of the resource agent is to also check the synchronisation status of the two SAP HANA databases. If the synchronisation is not "SOK", then the cluster avoids a takeover to the secondary site, if the primary fails. This is to improve the data consistency.

The resource agent uses the following five interfaces provided by SAP:

1. sapcontrol/sapstartsrv
The interface sapcontrol/sapstartsrv is used to start/stop a HANA database instance/system

2. landscapeHostConfiguration
The interface is used to monitor an entire HANA system. The python script is named landscapeHostConfiguration.py. landscapeHostConfiguration.py has some detailed output about HANA system status and node roles. For our monitor the overall status is relevant. This overall status is reported by the return code of the script: 0: Internal Fatal, 1: ERROR, 2: WARNING, 3: INFO, 4: OK The SAPHanaController resource agent will interpret return code 0 as FATAL, 1 as NOT-RUNNING (or ERROR) and return codes 2+3+4 as RUNNING.

3. hdbnsutil
The interface hdbnsutil is used to check the "topology" of the system replication as well as the current configuration (primary/secondary) of a SAP HANA database instance. A second task of the interface is the posibility to run a system replication takeover (sr_takeover) or to register a former primary to a newer one (sr_register).

4. systemReplicationStatus / hdbsql
SAP HANA 1.0 SPS 9 and later provide a python script "systemReplicationStatus.py" for checking the system replication. The SAPHanaSR-angi uses this script instead of hdbsql. So, to manage recent versions of SAP HANA with SAPHanaSR-angi, the hdbsql is not used anymore.

5. saphostctrl
The interface saphostctrl uses the function ListInstances to figure out the virtual host name of the SAP HANA instance. This is the hostname used during the HANA installation.

To make configuring the cluster as simple as possible, the additional SAPHanaTopology resource agent runs on all nodes of a SAPHanaSR cluster and gathers information about the statuses and configurations of SAP HANA system replications. The SAPHanaTopology RA is designed as a normal (stateless) clone.

Please see also the REQUIREMENTS section below and manual page SAPHanaSR-angi-scenarios(7).

SUPPORTED PARAMETERS

This resource agent supports the following parameters:

SID

SAP System Identifier. Has to be same on both instances. Example "SID=SLE".

InstanceNumber

Number of the SAP HANA database. Has to be same on both instances. For system replication also Instance Number+1 is blocked. Example "InstanceNumber=00".

DIR_EXECUTABLE

The full qualified path where to find sapstartsrv and sapcontrol. Specify this parameter, if you have changed the SAP kernel directory location after the default SAP installation.
Optional, well known directories will be searched by default.

DIR_PROFILE

The full qualified path where to find the SAP START profile. Specify this parameter, if you have changed the SAP profile directory location after the default SAP installation.
Optional, well known directories will be searched by default.

HANA_CALL_TIMEOUT

Define timeout how long a call to HANA to receive information can take. This could be e.g. landscapeHostConfiguration.py. There are some specific calls to HANA which have their own timeout values. For example the sr_takeover command does not timeout (inf). If the timeout is reached, the return code will be 124. If you increase the timeouts for HANA calls you should also adjust the operation timeouts of your Linux cluster resources.
Optional. Default value: 60.

INSTANCE_PROFILE

The name of the SAP HANA instance profile. Specify this parameter, if you have changed the name of the SAP HANA instance profile after the default SAP installation. Normally you do not need to set this parameter.
Optional, well known directories will be searched by default.

PREFER_SITE_TAKEOVER

Defines whether RA should prefer to takeover to the secondary database instead of restarting on primary site locally. Example: "PREFER_SITE_TAKEOVER=true".
Optional. Default value: false.

DUPLICATE_PRIMARY_TIMEOUT

Time difference needed between two primary time stamps (LPTs), in case a dual-primary situation occurs. If the difference between both node's last primary time stamps is less than DUPLICATE_PRIMARY_TIMEOUT, then the cluster holds one or both instances in a "WAITING" status. This is to give an admin the chance to react on a takeover. Note: How the cluster proceeds after the DUPLICATE_PRIMARY_TIMEOUT has passed, depends on the parameter AUTOMATED_REGISTER. See also the examples section below.
Optional. Default value: 7200.

AUTOMATED_REGISTER

Defines, whether a former primary instance should be registered automatically by the resource agent during cluster/resource start, if the DUPLICATE_PRIMARY_TIMEOUT is expired. Example: "AUTOMATED_REGISTER=true".
Default value: false.

SAPHanaFilter

Outdated parameter. Please do not use it any longer. This resource agent parameter has been replaced by the cluster property 'hana_${sid}_glob_filter'.

SUPPORTED PROPERTIES

hana_${sid}_glob_filter

Global cluster property hana_${sid}_glob_filter . This property should only be set if requested by support engineers. The default is sufficient for normal operation.

SUPPORTED ACTIONS

This resource agent supports the following actions (operations):

start

Starts the HANA instance or bring the "clone instance" to a WAITING status. Suggested minimum timeout: 3600.

stop

Stops the HANA instance. The correct value depends on factors like database size. If HANA database internal timeouts have been tuned for fast shutdown, the RA timeout might be reduced. Suggested minimum timeout: 600.

promote

Either runs a takeover for a secondary or a just-nothing for a primary. Suggested minimum timeout: 320.

demote

Nearly does nothing and just mark the instance as demoted. Suggested minimum timeout: 320.

status

Reports whether the HANA instance is running. Suggested minimum timeout: 60.

monitor (promoted role)

Reports whether the HANA instance seems to be working in replication primary mode. It also checks the system replication status. Suggested minimum timeout: 700. Suggested interval: 60.

monitor (demoted role)

Reports whether the HANA instance seems to be working inreplication secondary mode. It also checks the system replication status. The slave role's monitor interval has to be different from the promoted role. Suggested minimum timeout: 700. Suggested interval: 61.

validate-all

Reports whether the parameters are valid. Suggested minimum timeout: 5.

meta-data

Retrieves resource agent metadata (internal use only). Suggested minimum timeout: 5.

methods

Suggested minimum timeout: 5.

RETURN CODES

The return codes are defined by the OCF cluster framework. Please refer to the OCF definition on the website mentioned below.
In addition, log entries are written, which can be scanned by using a pattern like "SAPHanaCon.*RA.*rc=[1-7,9]" for errors. Regular operations might be found with "SAPHanaCon.*RA.*rc=0".

EXAMPLES

* Below is an example configuration for a SAPHanaController multi-state resource in a performance-optimized scenario.

In addition, a SAPHanaTopology clone resource is needed to make this work.

primitive rsc_SAPHanaCon_SLE_HDB00 ocf:suse:SAPHanaController \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="900" \
op demote interval="0" timeout="320" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Started" timeout="700" \
params SID="SLE" InstanceNumber="00" PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false"

clone mst_SAPHanaCon_SLE_HDB00 rsc_SAPHanaCon_SLE_HDB00 \
meta clone-max="2" clone-node-max="1" interleave="true" promotable="true"

* Below is an example configuration for the two SAPHanaController resources in a cost-optimized scenario.

The first SAP HANA resource is a multi-state pair of production HANAs with a system replication (e.g. PRD), managed by the SAPHanaController RA. The second SAP HANA is a single test HANA (e.g. TST) running together with the productive HANA secondary on the same node. This second -single- HANA is managed as a primitive resource by the SAPInstance RA. Of course, a SAPHanaTopology clone resource is needed to make this work. It is also necessary to prepare an HANA HA/DR hook script for adjusting the secondary HANA's memory in case of sr_takeover. See manual page susCostOpt.py(7) and URLs below. Finally, the SAPHanaController primary gets a priority to allow priority fencing. See manual page SAPHanaSR_basic_cluster(7).

primitive rsc_SAPHanaCon_PRD_HDB10 ocf:suse:SAPHanaController \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="900" \
op demote interval="0" timeout="320" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Started" timeout="700" \
params SID="PRD" InstanceNumber="10" PREFER_SITE_TAKEOVER="false" \
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false" \
meta priority=100

clone mst_SAPHanaCon_PRD_HDB10 rsc_SAPHanaCon_PRD_HDB10 \
meta clone-max="2" clone-node-max="1 interleave="true" promotable="true"

primitive rsc_SAPInstance_TST_HDB10 ocf:heartbeat:SAPInstance \
params InstanceName="TST_HDB10_node02 \
MONITOR_SERVICES="hdbindexserver|hdbnameserver" \
START_PROFILE="/usr/sap/{sapnpsid}/SYS/profile/TST_HDB10_node02" \
op start interval="0" timeout="600" \
op monitor interval="120" timeout="700" \
op stop interval="0" timeout="300" \

location loc_TST_never_on_node01 rsc_SAPInstance_TST_HDB20_node02 -inf: node01

colocation col_TST_never_with_PRD-ip -inf: rsc_SAPInstance_TST_HDB20_node02:Started \
rsc_ip_PRD_HDB10

order ord_TST_stop_before_PRD-promote inf: rsc_SAPInstance_TST_HDB20_node02:stop \
mst_SAPHanaCon_PRD_HDB10:promote

* Initiate an administrative takeover of the HANA primary from one node to the other one.

If the cluster should also register the former primary as secondary, AUTOMATED_REGISTER="true" is needed. Before the takeover will be initiated, the status of the Linux cluster and the HANA system replication have to be checked. The takeover should only be initiated as forced migration. After the takeover has been finished, the migration rule has to be deleted.
Note: Older versions of the Linux cluster have used the commands 'migrate' and
'unmigrate' instead of 'move' and 'clear'.

# cs_clusterstate
# SAPHanaSR-showAttr
# crm configure show | grep cli
# crm resource move mst_SAPHanaCon_SLE_HDB10 force
# cs_clusterstate -i
# SAPHanaSR-showAttr
# crm resource clear mst_SAPHanaCon_SLE_HDB10

* Manually start the HANA primary if only one node is available.

This might be necessary in case the cluster could not detect the status of both nodes.

1. Start the cluster.
2. Wait and check for cluster is running, and in status idle.
3. Become sidadm, and start HANA manually.
4. Wait and check for HANA is running.
5. In case the cluster does not promote the HANA to primary, instruct the cluster to migrate the IP address to that node.
6. Wait and check for HANA gets promoted to primary by the cluster.
7. Remove the migration rule from the IP address.
8. You are done, for now.
9. Please bring back the other node and register that HANA as soon as possible. If the HANA primary stays alone for too long, the log area will fill up.

* The following shows the filter for log messages set to the defaults.

This property should only be set if requested by support engineers. The default is sufficient for normal operation.

property $id="SAPHanaSR" \
hana_SLE_glob_filter="ra-act-dec-lpa"
* Search for log entries of the resource agent, show errors only:

# grep "SAPHana.*RA.*rc=[1-7,9]" /var/log/messages

* Show and delete failcount for resource.

Resource is rsc_SAPHanaCon_HA1_HDB00, node is node22. Useful after a failure has been fixed and for testing. See also cluster properties migration-threshold, failure-timeout and SAPHanaController parameter PREFER_SITE_TAKEOVER.

# crm resource failcount rsc_SAPHanaCon_HA1_HDB00 show node22
# crm resource failcount rsc_SAPHanaCon_HA1_HDB00 delete node22

* Check for working NTP service on SLE-HA 15:

# chronyc sources

* Use of DUPLICATE_PRIMARY_TIMEOUT and Last Primary Timestamp (LPT) in case the primary node has been crashed completely.

Typically on each side where the RA detects a running primary a time stamp is written to the node's attributes (last primary seen at time: lpt). If the timestamps ("last primary seen at") differ less than the DUPLICATE_PRIMARY_TIMEOUT than the RA could not automatically decide which of the two primaries is the better one.

1. nodeA is primary and has a current time stamp, nodeB is secondary and has a secondary marker set:
nodeA: 1479201695
nodeB: 30

2. Now nodeA crashes and nodeB takes over:
(nodeA: 1479201695)
nodeB: 1479201700

3. A bit later nodeA comes back into the cluster:
nodeA: 1479201695
nodeB: 1479202000
You see while nodeA keeps its primary down the old timestamp is kept. NodeB increases its timestamp on each monitor run.

4. After some more time (depending on the parameter DUPLICATE_PRIMARY_TIMEOUT)
nodeA: 1479201695
nodeB: 1479208895
Now the time stamps differ >= DUPLICATE_PRIMARY_TIMEOUT. The algorithm defines nodeA now as "the looser" and depending on the AUTOMATED_REGISTER the nodeA will become the secondary.

5. NodeA would be registered:
nodeA: 10
nodeB: 1479208900

6. Some time later the secondary gets into sync
nodeA: 30
nodeB: 1479209100

* Use of DUPLICATE_PRIMARY_TIMEOUT and Last Primary Timestamp (LPT) in case the the database on primary node has been crashed, but the node is still alive.

Typically on each side where the RA detects a running primary a time stamp is written to the node's attributes (last primary seen at time: lpt). If the timestamps ("last primary seen at") differ less than the DUPLICATE_PRIMARY_TIMEOUT than the RA could not automatically decide which of the two primaries is the better one.

1. nodeA is primary and has a current time stamp, nodeB is secondary and has a secondary marker set:
nodeA: 1479201695
nodeB: 30

2. Now HANA on nodeA crashes and nodeB takes over:
nodeA: 1479201695
nodeB: 1479201700

3. As the cluster could be sure to properly stopped the HANA instance at nodeA it *immediately* marks the old primary to be a register candidate, if AUTOMATED_REGISTER is true:
nodeA: 10
nodeB: 1479201760

4. Depending on the AUTOMATED_REGISTER parameter the RA will also immediately regisiter the former primary to become the new secondary:
nodeA: 10
nodeB: 1479201820

5. And after a while the secondary gets in sync
nodeA: 30
nodeB: 1479202132

FILES

/usr/lib/ocf/resource.d/suse/SAPHanaController
the resource agent itself
/usr/lib/ocf/resource.d/suse/SAPHanaTopology
the also needed topology resource agent
/usr/sap/$SID/$InstanceName/exe
default path for DIR_EXECUTABLE
/usr/sap/$SID/SYS/profile
default path for DIR_PROFILE

REQUIREMENTS

For the current version of the SAPHanaController resource agent that comes with the software package SAPHanaSR-angi, the support is limited to the scenarios and parameters described in the respective manual page SAPHanaSR(7).

BUGS

In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.

SEE ALSO

ocf_suse_SAPHanaTopology(7) , ocf_heartbeat_IPaddr2(7) , ocf_heartbeat_SAPDatabase(7) , susHanaSR.py(7) , susCostOpt.py(7) , susTkOver.py(7) , susChkSrv.py (7) , SAPHanaSR(7) , SAPHanaSR_basic_cluster(7) , SAPHanaSR-showAttr(8) , ntp.conf(5) , stonith(8) , cs_clusterstate(8) , crm(8) ,
https://www.suse.com/products/sles-for-sap/resource-library/sap-best-practices.html ,
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html ,
http://scn.sap.com/community/hana-in-memory/blog/2014/04/04/fail-safe-operation-of-sap-hana-suse-extends-its-high-availability-solution ,
http://scn.sap.com/community/hana-in-memory/blog/2015/12/14/sap-hana-sps-11-whats-new-ha-and-dr--by-the-sap-hana-academy ,
https://wiki.scn.sap.com/wiki/display/ATopics/HOW+TO+SET+UP+SAPHanaSR+IN+THE+COST+OPTIMIZED+SAP+HANA+SR+SCENARIO+-+PART+I ,
http://scn.sap.com/docs/DOC-47702 ,
http://www.saphana.com/docs/DOC-2775 ,
http://scn.sap.com/docs/DOC-60334 ,
http://scn.sap.com/docs/DOC-60337 ,
http://scn.sap.com/docs/DOC-65899

AUTHORS

F.Herschel, L.Pinne.

COPYRIGHT

(c) 2014 SUSE Linux Products GmbH, Germany.
(c) 2015-2017 SUSE Linux GmbH, Germany.
(c) 2018-2025 SUSE LLC
The resource agent SAPHanaController comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at http://www.gnu.org/licenses/gpl.html

10 Mar 2025