ocf_suse_SAPHanaController(7) | OCF resource agents | ocf_suse_SAPHanaController(7) |
NAME¶
SAPHanaController - Manages takeover between two SAP HANA databases with system replication (scale-up).
SYNOPSIS¶
SAPHanaController [start | stop | status | monitor | promote | demote | meta-data | validate-all | methods | usage ]
DESCRIPTION¶
SAPHanaController is an resource agent (RA) for SAP HANA databases. It manages takeover for a SAP HANA database with system replication in an OCF promotable clone configuration. This manual page explains SAP HANA scale-up scenarios. For scale-out, see SAPHanaController-scale-out(7).
System replication will help to replicate the database data from one computer to another computer in order to compensate for database failures. With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the resource agent must work together. The SAPHanaController RA performs the actual check of the SAP HANA database instances and is configured as promotable clone resource. Managing the two SAP HANA instances means that the resource agent controls the start/stop of the instances. In addition the resource agent is able to monitor the SAP HANA databases on landscape host configuration level. For this monitoring the resource agent relies on interfaces provided by SAP.
A third task of the resource agent is to also check the synchronisation status of the two SAP HANA databases. If the synchronisation is not "SOK", then the cluster avoids a takeover to the secondary site, if the primary fails. This is to improve the data consistency.
The resource agent uses the following five interfaces provided by SAP:
1. sapcontrol/sapstartsrv
The interface sapcontrol/sapstartsrv is used to start/stop a HANA database
instance/system
2. landscapeHostConfiguration
The interface is used to monitor an entire HANA system. The python script is
named landscapeHostConfiguration.py. landscapeHostConfiguration.py has some
detailed output about HANA system status and node roles. For our monitor the
overall status is relevant. This overall status is reported by the return
code of the script: 0: Internal Fatal, 1: ERROR, 2: WARNING, 3: INFO, 4: OK
The SAPHanaController resource agent will interpret return code 0 as FATAL,
1 as NOT-RUNNING (or ERROR) and return codes 2+3+4 as RUNNING.
3. hdbnsutil
The interface hdbnsutil is used to check the "topology" of the
system replication as well as the current configuration (primary/secondary)
of a SAP HANA database instance. A second task of the interface is the
posibility to run a system replication takeover (sr_takeover) or to register
a former primary to a newer one (sr_register).
4. systemReplicationStatus / hdbsql
SAP HANA 1.0 SPS 9 and later provide a python script
"systemReplicationStatus.py" for checking the system replication.
The SAPHanaSR-angi uses this script instead of hdbsql. So, to manage recent
versions of SAP HANA with SAPHanaSR-angi, the hdbsql is not used
anymore.
5. saphostctrl
The interface saphostctrl uses the function ListInstances to figure out the
virtual host name of the SAP HANA instance. This is the hostname used during
the HANA installation.
To make configuring the cluster as simple as possible, the additional SAPHanaTopology resource agent runs on all nodes of a SAPHanaSR cluster and gathers information about the statuses and configurations of SAP HANA system replications. The SAPHanaTopology RA is designed as a normal (stateless) clone.
Please see also the REQUIREMENTS section below and manual page SAPHanaSR-angi-scenarios(7).
SUPPORTED PARAMETERS¶
This resource agent supports the following parameters:
SID
InstanceNumber
DIR_EXECUTABLE
Optional, well known directories will be searched by default.
DIR_PROFILE
Optional, well known directories will be searched by default.
HANA_CALL_TIMEOUT
Optional. Default value: 60.
INSTANCE_PROFILE
Optional, well known directories will be searched by default.
PREFER_SITE_TAKEOVER
Optional. Default value: false.
DUPLICATE_PRIMARY_TIMEOUT
Optional. Default value: 7200.
AUTOMATED_REGISTER
Default value: false.
SAPHanaFilter
SUPPORTED PROPERTIES¶
hana_${sid}_glob_filter
SUPPORTED ACTIONS¶
This resource agent supports the following actions (operations):
start
stop
promote
demote
status
monitor (promoted role)
monitor (demoted role)
validate-all
meta-data
methods
RETURN CODES¶
The return codes are defined by the OCF cluster framework. Please
refer to the OCF definition on the website mentioned below.
In addition, log entries are written, which can be scanned by using a pattern
like "SAPHanaCon.*RA.*rc=[1-7,9]" for errors. Regular operations
might be found with "SAPHanaCon.*RA.*rc=0".
EXAMPLES¶
* Below is an example configuration for a SAPHanaController multi-state resource in a performance-optimized scenario.
In addition, a SAPHanaTopology clone resource is needed to make
this work.
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="900" \
op demote interval="0" timeout="320" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Started" timeout="700" \
params SID="SLE" InstanceNumber="00" PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false"
clone mst_SAPHanaCon_SLE_HDB00 rsc_SAPHanaCon_SLE_HDB00 \
meta clone-max="2" clone-node-max="1"
interleave="true" promotable="true"
* Below is an example configuration for the two SAPHanaController resources in a cost-optimized scenario.
The first SAP HANA resource is a multi-state pair of production HANAs with a system replication (e.g. PRD), managed by the SAPHanaController RA. The second SAP HANA is a single test HANA (e.g. TST) running together with the productive HANA secondary on the same node. This second -single- HANA is managed as a primitive resource by the SAPInstance RA. Of course, a SAPHanaTopology clone resource is needed to make this work. It is also necessary to prepare an HANA HA/DR hook script for adjusting the secondary HANA's memory in case of sr_takeover. See manual page susCostOpt.py(7) and URLs below. Finally, the SAPHanaController primary gets a priority to allow priority fencing. See manual page SAPHanaSR_basic_cluster(7).
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="900" \
op demote interval="0" timeout="320" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Started" timeout="700" \
params SID="PRD" InstanceNumber="10" PREFER_SITE_TAKEOVER="false" \
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false" \
meta priority=100
clone mst_SAPHanaCon_PRD_HDB10 rsc_SAPHanaCon_PRD_HDB10 \
meta clone-max="2" clone-node-max="1
interleave="true" promotable="true"
primitive rsc_SAPInstance_TST_HDB10 ocf:heartbeat:SAPInstance \
params InstanceName="TST_HDB10_node02 \
MONITOR_SERVICES="hdbindexserver|hdbnameserver" \
START_PROFILE="/usr/sap/{sapnpsid}/SYS/profile/TST_HDB10_node02" \
op start interval="0" timeout="600" \
op monitor interval="120" timeout="700" \
op stop interval="0" timeout="300" \
location loc_TST_never_on_node01 rsc_SAPInstance_TST_HDB20_node02 -inf: node01
colocation col_TST_never_with_PRD-ip -inf:
rsc_SAPInstance_TST_HDB20_node02:Started \
rsc_ip_PRD_HDB10
order ord_TST_stop_before_PRD-promote inf:
rsc_SAPInstance_TST_HDB20_node02:stop \
mst_SAPHanaCon_PRD_HDB10:promote
* Initiate an administrative takeover of the HANA primary from one node to the other one.
If the cluster should also register the former primary as
secondary, AUTOMATED_REGISTER="true" is needed. Before the
takeover will be initiated, the status of the Linux cluster and the HANA
system replication have to be checked. The takeover should only be initiated
as forced migration. After the takeover has been finished, the migration
rule has to be deleted.
Note: Older versions of the Linux cluster have used the commands 'migrate' and
'unmigrate' instead of 'move' and 'clear'.
# SAPHanaSR-showAttr
# crm configure show | grep cli
# crm resource move mst_SAPHanaCon_SLE_HDB10 force
# cs_clusterstate -i
# SAPHanaSR-showAttr
# crm resource clear mst_SAPHanaCon_SLE_HDB10
* Manually start the HANA primary if only one node is available.
This might be necessary in case the cluster could not detect the status of both nodes.
2. Wait and check for cluster is running, and in status idle.
3. Become sidadm, and start HANA manually.
4. Wait and check for HANA is running.
5. In case the cluster does not promote the HANA to primary, instruct the cluster to migrate the IP address to that node.
6. Wait and check for HANA gets promoted to primary by the cluster.
7. Remove the migration rule from the IP address.
8. You are done, for now.
9. Please bring back the other node and register that HANA as soon as possible. If the HANA primary stays alone for too long, the log area will fill up.
* The following shows the filter for log messages set to the defaults.
This property should only be set if requested by support
engineers. The default is sufficient for normal operation.
hana_SLE_glob_filter="ra-act-dec-lpa"
- * Search for log entries of the resource agent, show errors only:
* Show and delete failcount for resource.
Resource is rsc_SAPHanaCon_HA1_HDB00, node is node22. Useful after a failure has been fixed and for testing. See also cluster properties migration-threshold, failure-timeout and SAPHanaController parameter PREFER_SITE_TAKEOVER.
# crm resource failcount rsc_SAPHanaCon_HA1_HDB00 delete node22
* Check for working NTP service on SLE-HA 15:
* Use of DUPLICATE_PRIMARY_TIMEOUT and Last Primary Timestamp (LPT) in case the primary node has been crashed completely.
Typically on each side where the RA detects a running primary a time stamp is written to the node's attributes (last primary seen at time: lpt). If the timestamps ("last primary seen at") differ less than the DUPLICATE_PRIMARY_TIMEOUT than the RA could not automatically decide which of the two primaries is the better one.
1. nodeA is primary and has a current time stamp, nodeB is
secondary and has a secondary marker set:
nodeA: 1479201695
nodeB: 30
2. Now nodeA crashes and nodeB takes over:
(nodeA: 1479201695)
nodeB: 1479201700
3. A bit later nodeA comes back into the cluster:
nodeA: 1479201695
nodeB: 1479202000
You see while nodeA keeps its primary down the old timestamp is kept. NodeB
increases its timestamp on each monitor run.
4. After some more time (depending on the parameter
DUPLICATE_PRIMARY_TIMEOUT)
nodeA: 1479201695
nodeB: 1479208895
Now the time stamps differ >= DUPLICATE_PRIMARY_TIMEOUT. The algorithm
defines nodeA now as "the looser" and depending on the
AUTOMATED_REGISTER the nodeA will become the secondary.
5. NodeA would be registered:
nodeA: 10
nodeB: 1479208900
6. Some time later the secondary gets into sync
nodeA: 30
nodeB: 1479209100
* Use of DUPLICATE_PRIMARY_TIMEOUT and Last Primary Timestamp (LPT) in case the the database on primary node has been crashed, but the node is still alive.
Typically on each side where the RA detects a running primary a time stamp is written to the node's attributes (last primary seen at time: lpt). If the timestamps ("last primary seen at") differ less than the DUPLICATE_PRIMARY_TIMEOUT than the RA could not automatically decide which of the two primaries is the better one.
1. nodeA is primary and has a current time stamp, nodeB is
secondary and has a secondary marker set:
nodeA: 1479201695
nodeB: 30
2. Now HANA on nodeA crashes and nodeB takes over:
nodeA: 1479201695
nodeB: 1479201700
3. As the cluster could be sure to properly stopped the HANA
instance at nodeA it *immediately* marks the old primary to be a register
candidate, if AUTOMATED_REGISTER is true:
nodeA: 10
nodeB: 1479201760
4. Depending on the AUTOMATED_REGISTER parameter the RA will also
immediately regisiter the former primary to become the new secondary:
nodeA: 10
nodeB: 1479201820
5. And after a while the secondary gets in sync
nodeA: 30
nodeB: 1479202132
FILES¶
- /usr/lib/ocf/resource.d/suse/SAPHanaController
- the resource agent itself
- /usr/lib/ocf/resource.d/suse/SAPHanaTopology
- the also needed topology resource agent
- /usr/sap/$SID/$InstanceName/exe
- default path for DIR_EXECUTABLE
- /usr/sap/$SID/SYS/profile
- default path for DIR_PROFILE
REQUIREMENTS¶
For the current version of the SAPHanaController resource agent that comes with the software package SAPHanaSR-angi, the support is limited to the scenarios and parameters described in the respective manual page SAPHanaSR(7).
BUGS¶
In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.
SEE ALSO¶
ocf_suse_SAPHanaTopology(7) ,
ocf_heartbeat_IPaddr2(7) , ocf_heartbeat_SAPDatabase(7) ,
susHanaSR.py(7) , susCostOpt.py(7) , susTkOver.py(7) ,
susChkSrv.py (7) , SAPHanaSR(7) ,
SAPHanaSR_basic_cluster(7) , SAPHanaSR-showAttr(8) ,
ntp.conf(5) , stonith(8) , cs_clusterstate(8) ,
crm(8) ,
https://www.suse.com/products/sles-for-sap/resource-library/sap-best-practices.html
,
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
,
http://scn.sap.com/community/hana-in-memory/blog/2014/04/04/fail-safe-operation-of-sap-hana-suse-extends-its-high-availability-solution
,
http://scn.sap.com/community/hana-in-memory/blog/2015/12/14/sap-hana-sps-11-whats-new-ha-and-dr--by-the-sap-hana-academy
,
https://wiki.scn.sap.com/wiki/display/ATopics/HOW+TO+SET+UP+SAPHanaSR+IN+THE+COST+OPTIMIZED+SAP+HANA+SR+SCENARIO+-+PART+I
,
http://scn.sap.com/docs/DOC-47702 ,
http://www.saphana.com/docs/DOC-2775 ,
http://scn.sap.com/docs/DOC-60334 ,
http://scn.sap.com/docs/DOC-60337 ,
http://scn.sap.com/docs/DOC-65899
AUTHORS¶
F.Herschel, L.Pinne.
COPYRIGHT¶
(c) 2014 SUSE Linux Products GmbH, Germany.
(c) 2015-2017 SUSE Linux GmbH, Germany.
(c) 2018-2025 SUSE LLC
The resource agent SAPHanaController comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at
http://www.gnu.org/licenses/gpl.html
10 Mar 2025 |