SAPStartSrv_basic_cluster(7) | SAPStartSrv | SAPStartSrv_basic_cluster(7) |
NAME¶
SAPStartSrv_basic_cluster - basic settings to make SAPStartSrv work
DESCRIPTION¶
The SAP Enqueue Standalone 2 (ENSA2) scenario needs a certain basic cluster configuration. Besides this necessary settings, some additional configurations might match specific needs.
* Operating System Basics
systemd services
The services sapinit, sapping and sappong are needed for this cluster.
For SystemV style saphostagent and sapstartsrv, the sapinit script needs to be enabled. For systemd style saphostagent and sapstartsrv, the service saphostagent needs to be enabled and running, instance services SAP${SID}_${INO} need to be disabled. See also REQUIREMENTS in man page ocf_suse_SAPStartSrv(7).
tcp_retries2 = 9
The OS network parameter tcp_retries2 influences the timeout of an
alive TCP connection, when retransmissions remain unacknowledged. The SAP
application servers (PAS/AAS) and central services (ASCS/ERS) are relying on
TCP timeouts for detecting lost connections. On the other hand SAP session
timeouts and enqueue replication timeouts are defined on application level.
Tuning tcp_retries2 helps SAP sessions and enqueue replication surviving
cluster actions.
A value of 9 (for HZ=250) should let Linux TCP connections timeout fast enough
for default SAP application server and central services configuration.
Users and groups
Technical users and groups, such as <sid>adm are defined locally in the Linux system. Further user <sid>adm needs to be in group haclient. See man page passwd(5) and usermod(8.
Hostnames
Name resolution of the cluster nodes and the virtual IP address must be done locally on all cluster nodes. See man page hosts(5).
Time synchronization
Strict time synchronization between the cluster nodes is mandatory, e.g. per NTP. See man page chrony.conf(5).
NFS mounted filesystems
The shared filesystems /sapmnt/$SID/ and /usr/sap/$SID/ can be statically mounted on all cluster nodes. See man page fstab(5) and example below.
* CRM Basics
stonith-enabled = true
The cib bootstrap option stonith-enabled is crucial for any
reliable pacemaker cluster.
The value 'true' is one pre-requisite for having a cluster supported.
resource-stickiness = 1
The crm rsc_default resource-stickiness defines the 'stickiness' score a resource gets on the node where it is currently running. This prevents the cluster from moving resources around whithout an urgent need during a cluster transition. The correct value depends on number of resources, colocation rules and resource groups. Particularly additional resources colocated to the ASCS resource can affect cluster decisions. Too high value might prevent not only unwanted but also useful actions.
migration-threshold = 1
The crm rsc_default parameter migration-threshold defines how many errors on a resource can be detected before this resource will be moved to another node. For ENSA1 the migration-threshold needs to be 1 always. For ENSA2 the value could be higher. See also failure-timeout .
record-pending = true
The crm op_default record-pending defines, whether the intention of an action upon the resource is recorded in the Cluster Information Base (CIB). Setting this parameter to ´true´ allows the user to see pending actions like ´starting´ and ´stopping´ in crm_mon. Also the sap_suse_cluster_connector interface uses this information.
failure-timeout = 86400
The crm op_default failure-timeout defines how long failed actions
will be kept in the CIB. After that time the failure record will be deleted.
Time unit is seconds. See also migration-threshold.
The value '86400' means failure records will be cleaned automatically after
one day.
priority-fencing-delay = 30
The optional crm property priority-fencing-delay specified delay for the fencings that are targeting the lost nodes with the highest total resource priority in case we do not have the majority of the nodes in our cluster partition, so that the more significant nodes potentially win any fencing match, which is especially meaningful under split-brain of 2-node cluster. A promoted resource instance takes the base priority + 1 on calculation if the base priority is not 0. Any delay that are introduced by pcmk_delay_max configured for the corresponding fencing resources will be added to this delay. A meta attribute priority=100 or alike for the ASCS resource is needed to make this work. See ocf_suse_SAPStartSrv(7).
The delay should be significantly greater than, or safely twice, pcmk_delay_max.
EXAMPLES¶
* crm basic configuration
Below are examples of crm basic configuration for ENSA2 clusters.
Shown are specific parameters which are needed. Some general parameters are
left out.
This example has been taken from an ENSA2 three-node cluster SLE-HA 15 GA with
diskless SBD:
expected-quorum-votes=3 \
no-quorum-policy=suicide \
dc-deadtime=20 \
have-watchdog=true \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
stonith-watchdog-timeout=10 \
placement-strategy=balanced \
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3 \
failure-timeout=86400
op_defaults op-options: \
timeout=600 \
record-pending=true
This example has been taken from an ENSA2 two-node cluster SLE-HA 15 GA with disk-based SBD. An optional priority fecing is configured and the SBD pcmk_delay_max has been reduced:
params pcmk_delay_max=15
property cib-bootstrap-options: \
dc-deadtime=20 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
stonith-timeout=150 \
placement-strategy=balanced \
priority-fencing-delay=30
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3 \
failure-timeout=86400
op_defaults op-options: \
timeout=600 \
record-pending=true
* NFS shares for SAP instance filesystems
Below is an fstab example for filesystems needed by the ASCS/ERS pair. The filesystems are statically mounted on all nodes of the cluster for SAP system EN2. The SAP instance name is used consequently to prepare for optional multi-SID setups. The parent directory /usr/sap/ resides on each node locally. The file sapservices must not be shared between nodes. The correct mount options are depending on the NFS server.
nfs1:/s/EN2/usrsap /usr/sap/EN2 nfs rw,hard,intr,nolock,actimeo=1,proto=tcp 0 0
* ping resource for checking connectivity
Below is an example of an optional ping resource for checking
connectivity to the outer world. If the nodes have only one network
interface, shared between HA cluster and application, this measure does not
improve availability.
ASCS should run on an node from which more ping targets can be reached than
from others. If all nodes are same, ASCS stays where it is. Three vital
infrastructure servers outside the datacenter are choosen as ping targets.
If at least two targets are reachable, the current node is preferred for
running the ASCS. The maximum time for detecting connectivity changes is
ca.180 seconds.
op monitor interval=120 timeout=60 start-delay=10 on-fail=ignore \
params name=ping_ok host_list="proxy1 proxy2 proxy3"
clone cln_ping rsc_ping
location ASCS00_connected grp_EN2_ASCS00 \
rule 90000: ping_ok gt 1
* systemd services for the SAP instance
In case systemd style init is used for the SAP instance: saphostagent needs to be enabled and running, instance services need to be disabled. Example SID is HA1, instance number is 10.
# systemctl list-unit-files | grep -i sap
# systemctl status SAPHA1_10.service
# systemd-cgls -u SAP.slice
# systemd-cgls -u SAPHA1_10.service
* check saphostagent and show SAP instances
Basic check for the saphostagent.
# /usr/sap/hostctrl/exe/saphostctrl -function ListInstances
* SAP instance profile
Check the instance profile for HA specific settings. Example SID is EN2, instance number is 10.
# su - en2adm
~> sapcontrol -nr 10 -function GetStartProfile |\
grep -e art_Program_ -e Autostart -e halib
~> exit
* sidadm group membership
Check if the sidadm user is member of the HA specific haclient group. Example SID is EN2.
# groups en2adm
FILES¶
- /etc/passwd
- the local user database
- /etc/groups
- the local group database
- /etc/hosts
- the local hostname resolution database
- /etc/chrony.conf
- basic config for time synchronisation
- /etc/sysctl.d/*.conf
- OS kernel parameters, e.g. TCP tunables
- /etc/fstab
- filesystem table, for statically mounted NFS shares
- /etc/systemd/system/SAP<SID>_<NR>.service
- systemd unit file for SAP instance
BUGS¶
In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report feedback and suggestions to feedback@suse.com.
SEE ALSO¶
ocf_suse_SAPStartSrv(7) ,
sap_suse_cluster_connector(8) , ocf_pacemaker_ping(7) ,
ocf_heartbeat_ethmonitor(7) , attrd_updater(8) , sbd(8)
, stonith_sbd(8) , crm(8) , corosync.conf(5) ,
votequorum(5) , hosts(5) , fstab(5) , passwd(5)
, groups(8) , usermod(8) , chrony.conf(5) ,
systemctl(1) , systemd-cgls(1) , systemd-analyze(1) ,
systemd-delta(1) , ha_related_suse_tids(7) ,
ha_related_sap_notes(7) ,
https://documentation.suse.com/sbp/all/?context=sles-sap ,
https://documentation.suse.com/sles-sap/ ,
https://www.suse.com/support/kb/ ,
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
AUTHORS¶
F.Herschel, L.Pinne
COPYRIGHT¶
(c) 2020-2023 SUSE LLC
SAPStartSrv comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at
http://www.gnu.org/licenses/gpl.html
30 Jan 2023 |