SAPHanaSR(7) | SAPHanaSR_basic_cluster | SAPHanaSR(7) |
NAME¶
SAPHanaSR_basic_cluster - SAP HANA System Replication scale-up basic cluster configuration.
DESCRIPTION¶
The SAP HANA System Replication scale-up scenario needs a certain basic cluster configuration. Besides this necessary settings, some additional configurations might match specific needs. Adapting a few SAP HANA settings might be beneficial as well.
* CRM Basics
default-resource-stickiness = 1000
The crm basic parameter default-resource-stickiness defines the
'stickiness' score a resource gets on the node where it is currently
running. This prevents the cluster from moving resources around without an
urgent need during a cluster transition. The correct value depends on number
of resources, colocation rules and resource groups. Particularly additional
groups colocated to the HANA primary master resource can affect cluster
decisions. Too high value might prevent not only unwanted but also useful
actions. This is because SAPHanaSR uses an internal scoring table for
placing the HANA roles on the right nodes.
Mandatory, default 1.
failure-timeout = 86400
The crm basic parameter failure-timeout defines how long failed
actions will be kept in the CIB. After that time the failure record will be
deleted. The time is measured in seconds. See also migration-threshold
below.
Optional, no default.
migration-threshold = 5000
The crm basic parameter migration-threshold defines how many
errors on a resource can be detected before this resource will be migrated
to another node. See also failure-timeout.
Mandatory, default 3.
record-pending = true
The op_default record-pending defines, whether the intention of an
action upon the resource is recorded in the Cluster Information Base (CIB).
Setting this parameter to 'true' allows the user to see pending actions like
'starting' and 'stopping' in crm_mon and Hawk.
Optional, default false.
pcmk_delay_max = 30
The sbd stonith parameter pcmk_delay_max defines an upper limit
for waiting before a fencing/stonith request will be triggerd. This
parameter should prevent the cluster from unwanted double fencing in case of
spilt-brain. A value around 30 seconds is required in two-node clusters,
except priority fencing is used.
Mandatory, default 5.
priority-fencing-delay = 30
The optional crm property priority-fencing-delay specified delay
for the fencings that are targeting the lost nodes with the highest total
resource priority in case we do not have the majority of the nodes in our
cluster partition, so that the more significant nodes potentially win any
fencing match, which is especially meaningful under split-brain of 2-node
cluster. A promoted resource instance takes the base priority + 1 on
calculation if the base priority is not 0. Any delay that are introduced by
pcmk_delay_max configured for the corresponding fencing resources will be
added to this delay. A meta attribute priority=100 or alike for the
SAPHanaController primitive resource is needed to make this work. See
ocf_suse_SAPHanaController(7). The delay should be significantly greater
than, or safely twice, pcmk_delay_max.
Optional, no default.
* systemd Basics
saphostagent.service enabled
SAP${SID}_${INO}.service enabled
In case systemd-style init is used for the HANA database, the services saphostagent and SAP${SID}_${INO} need to be enabled and running inside the SAP slice. The instance profile Autostart feature needs to be off. The service saptune is highly recommended, see manual page saptune(8).
* pacemaker service dependency to SAP instance service
[Unit]
Wants=SAP${SID}_${INO}.service
After=SAP${SID}_${INO}.service
In case systemd-style init is used for the HANA database, it might be desired to have the SAP instance service stopping after pacemaker at system shutdown. Therefor a drop-in file for the pacemaker service might help. See examples below.
* pacemaker service basics
PCMK_fail_fast = yes
The parameter PCMK_fail_fast in /etc/sysconfig/pacemaker specifies
how pacemaker reacts on failures of its subdaemons. Default "no"
means to restart failed subdaemons, while "yes" means fencing the
node. Setting "yes" might help to avoid undefined situations.
Optional, default no.
* SAP HANA Basics
/usr/sap/${SID}/SYS/global/hdb/custom/config/global.ini
[memorymanager]
final_memory_release_shutdown = [ auto | on | off ]
final_memory_release_crash = [ auto | on | off ]
Starting with SAP HANA 2.0 SPS06, the database shutdown can be accelerated by optimizing memory de-allocation. Please refer to SAP documentation before setting this parameters.
/usr/sap/${SID}/SYS/global/hdb/custom/config/daemon.ini
[daemon]
terminationtimeout = [ millisec ]
forcedterminationtimeout = [ millisec ]
The first parameter defines the timeout from sending SIGTERM to finally terminating child processes when HANA is shutting down by the STOP event. Used also as maximal delay in system restart if 'restartterminationtimeout' parameter is not set. The second defines the timeout from sending the SIGTERM to finally terminating child processes when HANA is shutting down by the QUIT event. Please refer to SAP documentation before setting this parameters.
EXAMPLES¶
* crm basic configuration
Below is an example crm basic configuration for SAPHanaSR. Shown are specific parameters which are needed. Some general parameters are left out.
The following example is for 15 SP5 with disk-based SBD:
property cib-bootstrap-options: \
have-watchdog=true \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
placement-strategy=balanced \
stonith-timeout=144 \
rsc_defaults rsc-options: \
resource-stickiness=1000 \
migration-threshold=3 \
failure-timeout=86400
op_defaults op-options: \
timeout=600 \
record-pending=true
* crm simple SBD stonith configuration
To complete the SBD setup, it is necessary to activate SBD as STONITH/fencing mechanism in the CIB. The SBD is normally used for SAPHanaSR scale-up instead of any other fencing/stonith mechanism. Example for a basic disk-based SBD resource:
params pcmk-delay-max=30
* crm priority fencing SBD stonith configuration
Example for a priority fencing disk-based SBD resource.
primitive rsc_stonith_sbd stonith:external/sbd \
params pcmk-delay-max=15 \
property cib-bootstrap-options: \
priority-fencing-delay=30
* crm simple IP address resource configuration
Let the Linux cluster manage one IP address and move that address along with the HANA primary master nameserver. Interval and timeout are set for friendly conditions.
primitive rsc_ip_SLE_HDB00 IPaddr2 \
op monitor interval=10 timeout=20 \
params ip=192.168.178.188
colocation col_ip_with_SLE_HDB00 \
2000: rsc_ip_SLE_HDB00:Started mst_SAPHanaCon_SLE_HDB00:Promoted
* crm protective IP address resource configuration
Let the Linux cluster manage one IP address. The resource should tolerate short absence of (virtual) network cards. If a monitor and the resulting start fails, the node should get fenced. This protects against risk of HANA dual-primary. A colocation constraint between this IP address resource and the HANA primary master nameserver is needed as well, see respective examples.
primitive rsc_ip_SLE_HDB00 IPaddr2 \
op monitor interval=30 timeout=60 \
op start timeout=90 on-fail=fence \
params ip=192.168.178.188 cidr_netmask=32
This configuration might be used in public cloud environments where virtual NICs might be expected to disappear temporarily. Of course, it would be better to not let NICs disappear in production. I.e. by adding redundancy to the network or by setting the cluster into maintenance.
* crm IP address for active/active read-enabled resource configuration
Let the Linux cluster manage an additional IP address and move that address along with the HANA secondary master nameserver.
primitive rsc_ip_ro_SLE_HDB00 IPaddr2 \
op monitor interval=10 timeout=20 \
params ip=192.168.178.199
colocation col_ip_ro_with_secondary_SLE_HDB00 \
2000: rsc_ip_ro_SLE_HDB00:Started mst_SAPHanaCon_SLE_HDB00:Demoted
location loc_ip_ro_not_master_SLE_HDB00 \
rsc_ip_ro_SLE_HDB00 \
rule -inf: hana_sle_roles ne master1:master:worker:master
* crm grouped IP address resource configuration
Let the Linux cluster manage one IP address and move that address along with the HANA primary master nameserver. An auxiliary resource is needed for specific public cloud purpose.
You should not bind resource to the HANA master role. This would change the effective resource scoring and might prevent the cluster from taking expected actions. If, for any reason, you need to bind additional resource to the HANA resource, you need to reduce that additional resource´s stickiness to 1.
primitive rsc_ip_SLE_HDB00 IPaddr2 \
op monitor interval=10s timeout=20s \
params ip=192.168.178.188 cidr_netmask=32
primitive rsc_lb_SLE_HDB00 azure-lb \
params port=62502
group grp_ip_SLE_HDB00 rsc_lb_SLE_HDB00 rsc_ip_SLE_HDB00 \
meta resource-stickiness=1
colocation col_ip_with_SLE_HDB00 \
8000: grp_ip_SLE_HDB00:Started mst_SAPHanaCon_SLE_HDB00:Promoted
* crm MailTo resource configuration
The HANA landscape status is stored inside CIB as attribute hana_<sid>_roles. A healthy HANA master looks like "master1:master:worker:master". First field is the HANA landscape status. If that status goes to 3 or 2, something has happened to HANA, but the cluster will not perform a takeover. Status 1 will trigger a takeover, status 0 indicates an undefined fatal failure. See manual pages ocf_suse_SAPHanaController(7) and ocf_heartbeat_MailTo(7).
You could define a MailTo resource that informs you as soon as attribute hana_<sid>_roles deviates from above ideal:
params email="root@localhost" subject="hana_ha1_roles changed" \
op monitor timeout=10 interval=30 depth=0 \
location loc_mailto_HA1_HDB10_with_sane rsc_mailto_HA1_HDB10 \
rule 90000: hana_ha1_roles eq master1:master:worker:master
colocation col_mailto_HA1_HDB10_with_prim \
3000: rsc_mailto_HA1_HDB10:Started mst_SAPHanaCon_HA1_HDB10:Promoted
* check how resource stickiness affects promotion scoring
SAPHanaSR uses an internal scoring table. The promotion scores for HANA primary and secondary master are in a certain range. The scores used by the Linux cluster should be in the same range.
# SAPHanaSR-showAttr | grep master.:master
# crm_simulate -Ls | grep promotion
* clean up SDB stonith resource after write failure
In rare cases the SBD stonith resource fails writing to the block device. After the root cause has been found and fixed, the failure message can be cleaned.
# stonith_admin --cleanup --history=<originator_node>
* check saphostagent and show SAP instances
Basic check for the saphostagent.
# /usr/sap/hostctrl/exe/saphostctrl -function ListInstances
* check systemd services for the HANA database
In case systemd-style init is used for the HANA database, the services can be checked. Example SID is HA1, instance number is 10.
# systemctl list-unit-files | grep -i sap
# systemctl status SAPHA1_10.service
# systemd-cgls -u SAP.slice
# systemd-cgls -u SAPHA1_10.service
# systemctl show SAPHA1_10.service
* show pacemaker service drop-in file
In case systemd-style init is used for the HANA database, it might be desired to have the SAP instance service stopping after pacemaker at system shutdown. A drop-in file might help. Example SID is S07, instance number is 00.
[Unit]
Description=pacemaker needs SAP instance service
Documentation=man:SAPHanaSR_basic_cluster(7)
Wants=SAPS07_00.service
After=SAPS07_00.service
* check for pacemaker dependency to SAP instance service
Example SID is S07, instance number is 00.
# systemd-delta | grep pacemaker
# systemd-analyze dot | grep "pacemaker.*SAPS07_00"
BUGS¶
In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.
SEE ALSO¶
ocf_suse_SAPHanaTopology(7) ,
ocf_suse_SAPHanaController(7) , ocf_suse_SAPHanaFilesystem(7)
, ocf_heartbeat_IPaddr2(7) , ocf_heartbeat_MailTo(7) ,
sbd(8) , stonith_sbd(7) , stonith_admin(8) ,
crm_no_quorum_policy(7) , crm(8) , crm_simulate(8) ,
SAPHanaSR(7) , SAPHanaSR-showAttr(7) , corosync.conf(5)
, votequorum(5) , nfs(5) , mount(8) ,
systemctl(1) , systemd-cgls(1) , systemd-analyze(1) ,
systemd-delta(1) , ha_related_suse_tids(7) ,
ha_related_sap_notes(7) ,
https://documentation.suse.com/sbp/sap/ ,
https://documentation.suse.com/sles-sap/ ,
https://www.suse.com/support/kb/ ,
https://www.clusterlabs.org
AUTHORS¶
A.Briel, F.Herschel, L.Pinne.
COPYRIGHT¶
(c) 2018 SUSE Linux GmbH, Germany.
(c) 2019-2024 SUSE LLC
For details see the GNU General Public License at
http://www.gnu.org/licenses/gpl.html
14 Nov 2024 |