Scroll to navigation

SAPHanaSR-ScaleOut(7) SAPHanaSR-angi SAPHanaSR-ScaleOut(7)

NAME

SAPHanaSR-ScaleOut - Automating SAP HANA system replication in scale-out setups.

DESCRIPTION

Overview

This manual page SAPHanaSR-ScaleOut provides information for setting up and managing automation of SAP HANA system replication (SR) in scale-out setups. For scale-up, please refer to SAPHanaSR(7), see also SAPHanaSR-angi(7).

System replication will help to replicate the database data from one site to another site in order to compensate for database failures. With this mode of operation, internal SAP HANA high-availability (HA) mechanisms and the Linux cluster have to work together.

An HANA scale-out setup already is, to some degree, an HA cluster on its own. The HANA is able to replace failing nodes with standby nodes or to restart certain sub-systems on other nodes. As long as the HANA landscape status is not "ERROR" the Linux cluster will not act. The main purpose of the Linux cluster is to handle the takeover to the other site. Only if the HANA landscape status indicates that HANA can not recover from the failure and the replication is in sync, then Linux will act. As an exception, the Linux cluster will react if HANA moves the master nameserver role to another candidate. SAPHanaController is also able to restart former failed worker nodes as standby. In addition to the SAPHanaTopology RA, the SAPHanaSR-ScaleOut solution uses a "HA/DR providers" API provided by HANA to get informed about the current state of the system replication.

Different workloads on HANA scale-out systems are possible:

* Online Analytical Processing (OLAP, aka SAP BW)
This systems have an high number of nodes, including standby nodes. E.g. 10 nodes with 2TB RAM each per site. HANA host-autofailover can handle failing nodes.
* Online Transaction Processing (OLTP, aka SAP ERP)
This systems have a small number of large nodes. The systems are mission critical. Often another system replication site is connected to increase data redundancy. E.g. 2 nodes with 20TB RAM each per site. Specific HANA HA/DR providers can handle failing processes.

In a nutshell: OLAP systems have many nodes. OLTP systems have few large nodes. HANA can handle site-internal fail-over. The Linux cluster handles split-brain detection, node fencing on general failures, takeover from primary to secondary site including IP address.

Scenarios

In order to illustrate the meaning of the above overview, some important situations are described below. This is not a complete description of all situations.

1. Local start of standby node
The Linux cluster will react if HANA moves any worker node (including the master nameserver role) to another candidate. If the failed node or instance is available in the cluster and switched to HANA standby role, the Linux cluster will restart the SAP HANA local framework so this node could be used for future failovers. This is one exception from the general rule, that the Linux cluster does nothing as long as the HANA landscape status is not "ERROR".

2. Prevention against dual-primary
A primary absolutely must never be started, if the cluster does not know anything about the other site. On initial cluster start, the cluster needs to detect a valid HANA system replication setup, including system replication status (SOK) and last primary timestamp (LPT). This is necessary to ensure data integrity.

The rational behind this is shown in the following scenario:
1. site_A is primary, site_B is secondary - they are in sync.
2. site_A crashes (remember the HANA ist still marked primary).
3. site_B does the takeover and runs now as new primary.
4. DATA GETS CHANGED ON NODE2 BY PRODUCTION
5. The admin also stops the cluster on site_B (we have two HANAs both
internally marked down and primary now).
6. What, if the admin would now restart the cluster on site_A?
6.1 site_A would take its own CIB after waiting for the initial fencing
time for site_B.
6.2 It would "see" its own (cold) primary and the fact that there was a
secondary.
6.3 It would start the HANA from point of time of step 1.->2.
(the crash), so all data changed inbetween would be lost.
This is why the Linux cluster needs to enforce a restart inhibit.

There are two options to get back both, SAP HANA SR and the Linux cluster, into a fully functional state:
a) The admin starts both nodes again.
b) In case the site_B is still down, the admin starts the primary on site_A
manually.
The Linux cluster will follow this administrative decision. In both cases the administrator should register and start a secondary as soon as posible. This avoids a full log partition with consequence of a DATABASE STUCK.

3. Automatic registration as secondary after site failure and takeover
The cluster can be configured to register a former primary database automatically as secondary. If this option is set, the resource agent will register a former primary database as secondary during cluster/resource start.

4. Site takeover not preferred over local re-start
SAPHanaSR-ScaleOut allows to configure, if you prefer to takeover to the secondary after the primary landscape fails. The alternative is to restart the primary landscape, if it fails and only to takeover when no local restart is possible anymore. This can be tuned by SAPHanaController(7) parameters.
Preferring takeover over local restart usually has been choosen for HANA system replication performance-optimised setups. On the other hand local restart of the master instead of takeover could be combined with HANA's persistent memory features.
The current implementation only allows to takeover in case the landscape status reports 1 (ERROR). The cluster will not takeover, when the SAP HANA still tries to repair a local failure.

5. Recovering from failure of master nameserver
If the master nameserver of an HANA database system fails, the HANA will start the nameserver on another node. Therefor usually up to two nodes are configured as additional nameserver candidates. At least one of them should be a standby node to optimize failover time. The Linux cluster will detect the change and move the IP address to the new active master nameserver. This case fits for scale-out HANAs with big number of nodes, e.g. BW systems.

6. Local stop of orphaned worker node
If the HANA master nameserver fails, the HANA landscape status goes to "ERROR", and HANA can not recover on its own, the orphaned HANA worker will be stopped by the Linux cluster. This case fits for scale-out HANAs with small number of nodes, e.g. ERP systems with two nodes.

Implementation

The two HANA database systems (primary and secondary site) are managed by the same single Linux cluster. The maximum number of nodes in that single Linux cluster is given by the Linux cluster limit. An odd number of nodes is needed to handle split-brain situations automatically in stretched clusters.

The HANA consists of two sites with same number of nodes each. There are no HANA nodes outside the Linux cluster. An additional Linux cluster node is used as majority maker for split-brain situations. This dedicated node does not need to have HANA installed and must not run any SAP HANA resources for the same SID. Supported combinations of CPU architectures are given by the Linux cluster.
Note: A third HANA site can be added, with another system replication (aka multi-target replication). This will be tolerated by the cluster. Nevertheless, this additional pieces are not managed by the cluster. Therefor in multi-target setups, the AUTOMATED_REGISTER feature needs the register_secondaries_on_takeover feature of HANA.

The HANA HA/DR provider is used for detecting system replication status changes.

A common STONITH mechanism is set up for all nodes across all the sites.

Since the IP address of the primary HANA database system is managed by the cluster, only that single IP address is needed to access any nameserver candidate.

Best Practice

* Use two independent corosync rings, at least one of them on bonded network. Resulting in at least three physical links. Unicast is preferred.

* Use Stonith Block Device (SBD) for node fencing. Of course, always together with hardware watchdog. The SBD can be implemented disk-based with shared LUNs across all nodes on all (three) sites. Or it can be implemented as diskless SBD.

* Align all timeouts in the Linux cluster with the timeouts of the underlying infrastructure - particuarly network, storage and multipathing.

* Check the installation of OS and Linux cluster on all nodes before doing any functional tests.

* Carefully define, perform, and document tests for all failure scenarios that should be covered, as well as all maintenance procedures.

* Test HANA HA and SR features without Linux cluster before doing the overall cluster tests.

* Test basic Linux cluster features without HANA before doing the overall cluster tests.

* Be patient. For detecting the overall HANA status, the Linux cluster needs a certain amount of time, depending on the HANA and the configured intervals and timeouts.

* Before doing anything, always check for the Linux cluster's idle status, left-over migration constraints, and resource failures as well as the HANA landscape status, and the HANA SR status.

* Manually activating an HANA primary creates risk of a dual-primary situation. The user is responsible for data integrity. See also susTkOver.py(7).

REQUIREMENTS

For the current version of the package SAPHanaSR-angi, the scale-out capabilities are limited to the following scenarios and parameters:

1. HANA scale-out cluster with system replication. The HANA system replication secondary runs memory preload (aka performance-optimised scenario). The two HANA database systems (primary and secondary site) are managed by the same single Linux cluster. The maximum number of nodes in that single Linux cluster is given by the Linux cluster limit. An odd number of nodes is needed to handle split-brain situations automatically. A dedicated cluster node is used as majority maker. An odd number of nodes leads to a Linux cluster in either one site or across three sites.

2. Technical users and groups such as sidadm should be defined locally in the Linux system. If users are resolved by remote service, local caching is necessary. Substitute user (su) to sidadm needs to work reliable and without customized actions or messages. Supported shell is bash.

3. Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of the Linux cluster have configured the same timezone.

4. For scale-out there is no other SAP HANA system (like QA) on the nodes which needs to be stopped during takeover. Both HANA database systems are running memory-preload. Also MCOS is currently not supported for scale-out.

5. Only one system replication between the two SAP HANA databases in the Linux cluster. Maximum one system replication to an HANA database outside the Linux cluster.

6. The replication mode is either sync or syncmem for the controlled replication. Replication mode async is not supported. The operation modes delta_datashipping, logreplay and logreplay_readaccess are supported. The operation mode logreplay is default.

7. Both SAP HANA database systems have the same SAP Identifier (SID) and Instance Number (INO).

8. Besides SAP HANA you need SAP hostagent installed and started on your systems. For SystemV style, the sapinit script needs to be active. For systemd style, the services saphostagent and SAP${SID}_${INO} can stay enabled. The systemd enabled saphostagent and instance´s sapstartsrv is supported from SAPHanaSR-ScaleOut 0.181 onwards. Please refer to the OS documentation for the systemd version. Please refer to SAP documentation for the SAP HANA version. Combining systemd style hostagent with SystemV style instance is allowed. However, all nodes in one Linux cluster have to use the same style.

9. Automated start of SAP HANA database systems during system boot must be switched off.

10. For scale-out, the current resource agent supports SAP HANA in system replication beginning with HANA version 2.0 SPS05 rev.59.04.

11. For scale-out, if the shared storage is implemented with another cluster, that one does not interfere with the Linux cluster. All three clusters (HANA, storage, Linux) have to be aligned.

12. The RAs SAPHanaController and SAPHanaTopology need to be installed on all cluster nodes, even the majority maker.

13. Colocation constraints between the SAPHanaController RA and other resources are allowed only if they do not affect the RA's scoring. The location scoring finally depends on system replication status an must not be over-ruled by additional constraints. Thus it is not allowed to define rules forcing a SAPHanaController promoted instance to follow another resource.

14. The Linux cluster needs to be up and running to allow HA/DR provider events being written into CIB attributes. The current HANA SR status might differ from CIB srHook attribute after cluster maintenance.

15. Once an HANA system replication site is known to the Linux cluster, that exact site name has to be used whenever the site is registered manually. At any time only one site is configured as primary replication source.

16. In two-node HANA scale-out systems only one master nameserver candidate is configured.

17. Reliable access to the /hana/shared/ filesystem is crucial for HANA and the Linux cluster.

18. HANA feature Secondary Time Travel is not supported.

19. The HANA scale-out system must have only one failover group. Tennant-specific takeover groups are not supported. Sharing standby nodes across sites is not supported.

20. In MDC configurations the HANA database is treated as a single system including all database containers. Therefor, cluster takeover decisions are based on the complete status independent of the status of individual containers.

21. If a third HANA site is connected by system replication, that HANA is not controlled by another SUSE HA cluster. If that third site should work as part of a fall-back HA cluster in DR case, that HA cluster needs to be in standby.

22. RA and srHook runtime almost completely depends on call-outs to controlled resources, OS and Linux cluster. The infrastructure needs to allow these call-outs to return in time.

23. The SAP HANA Fast Restart feature on RAM-tmpfs as well as HANA on persistent memory can be used, as long as they are transparent to SUSE HA.

24. The SAP HANA hostname or virtual hostname should follow RFC-952.

25. The SAP HANA site name is from 2 up to 32 characters long. It starts with a character or number. Subsequent characters may contain dash and underscore. However, underscore should be avoided.

26. The SAPHanaController RA, the SUSE HA cluster and several SAP components need read/write access and sufficient space in the Linux /tmp filesystem.

27. SAP HANA Native Storage Extension (NSE) is supported. Important is that this feature does not change the HANA topology or interfaces. In opposite to Native Storage Extension, the HANA Extension Nodes are changing the topology and thus currently are not supported. Please refer to SAP documentation for details.

28. No manual actions must be performed on the HANA database while it is controlled by the Linux cluster. All administrative actions need to be aligned with the cluster. See also SAPHanaSR_maintenance_examples(7).

BUGS

In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.

SEE ALSO

SAPHanaSR-angi(7) , SAPHanaSR-angi-scenarios(7) , SAPHanaSR(7) , ocf_suse_SAPHanaTopology(7) , ocf_suse_SAPHanaController(7) , ocf_heartbeat_IPaddr2(7) , SAPHanaSR-ScaleOut_basic_cluster(7) , SAPHanaSR-showAttr(8) , susHanaSR.py(7) , susHanaSrMultiTarget.py(7) , susTkOver.py(7) , susChkSrv.py(7) , chrony.conf(5) , systemctl(1) , stonith(8) , sbd(8) , stonith_sbd(7) , stonith_admin(8) , crm(8) , corosync.conf(5) , crm_no_quorum_policy(7) , saptune(8) , cs_show_hana_info(8) , supportconfig(8) , ha_related_suse_tids(7) , ha_related_sap_notes(7) ,
https://documentation.suse.com/sbp/sap/ ,
https://documentation.suse.com/sles-sap/ ,
https://www.suse.com/releasenotes/ ,
https://www.susecon.com/archive-2020.html ,
https://www.susecon.com/archive-2021.html ,
https://archive.sap.com/documents/docs/DOC-60334 ,
http://scn.sap.com/community/hana-in-memory/blog/2015/12/14/sap-hana-sps-11-whats-new-ha-and-dr--by-the-sap-hana-academy ,
https://blogs.sap.com/2020/01/30/sap-hana-and-persistent-memory/ ,
https://www.rfc-editor.org/rfc/rfc952

AUTHORS

A.Briel, F.Herschel, L.Pinne.

COPYRIGHT

(c) 2015-2017 SUSE Linux GmbH, Germany.
(c) 2018-2025 SUSE LLC
The package SAPHanaSR-angi comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at http://www.gnu.org/licenses/gpl.html

10 Jan 2025