| SAPCMControlZone_basic_cluster(7) | SAPCMControlZone | SAPCMControlZone_basic_cluster(7) |
NAME¶
SAPCMControlZone_basic_cluster - basic settings to make SAPCMControlZone work.
DESCRIPTION¶
The Convergent Mediation (CM) ControlZone needs a certain basic cluster configuration. Besides neccessary settings, additional configurations might match specific needs.
* Operating System Basics
Users and groups
Technical users and groups, such as "mzadmin" are defined locally in the Linux system. See manual page passwd(5) and usermod(8). The mzadmin user needs certain environment variables set in ~/.bashrc. See manual page ocf_suse_SAPCMControlZone(7) for details.
Hostnames
Name resolution of the cluster nodes and the virtual IP address must be done locally on all cluster nodes. See manual page hosts(5).
Time synchronization
Strict time synchronization between the cluster nodes is mandatory, e.g. NTP. See manual page chrony.conf(5). Further the nodes should have configured the same timezone.
mzadmin´s ~/.bashrc
Certain values for environment variables JAVA_HOME, MZ_HOME and MZ_PLATFORM are needed, depending on the specific setup. See manual page ocf_suse_SAPCMControlZone(7) for details.
NFS mounted filesystem
A shared filesystem for ControlZone data can be statically mounted on both cluster nodes. This filesystem holds work directories, e.g. for batch processing. It must not be confused with the ControlZone application itself. The application is copied from NFS to both cluster nodes into local filesystems. Client-side write caching has to be disabled for the NFS shares containing customer data. See manual page fstab(5) and example below.
* CRM Basics
stonith-enabled = true
The cib bootstrap option stonith-enabled is crucial for any
reliable pacemaker cluster.
The value 'true' is one pre-requisite for having a cluster supported.
migration-threshold = 3
The crm rsc_default parameter migration-threshold defines how many errors on a resource can be detected before this resource will be moved to another node. A value greater than 1 is needed for resource monitor option on-fail=restart. See also failure-timeout.
record-pending = true
The crm op_default record-pending defines, whether the intention of an action upon the resource is recorded in the Cluster Information Base (CIB). Setting this parameter to ´true´ allows the user to see pending actions like ´starting´ and ´stopping´ in crm_mon.
failure-timeout = 86400
The crm op_default failure-timeout defines how long failed actions
will be kept in the CIB. After that time the failure record will be deleted.
Time unit is seconds. See also migration-threshold.
The value '86400' means failure records will be cleaned automatically after
one day.
priority-fencing-delay = 30
The optional crm property priority-fencing-delay specified delay
for the fencings that are targeting the lost nodes with the highest total
resource priority in case we do not have the majority of the nodes in our
cluster partition, so that the more significant nodes potentially win any
fencing match, which is especially meaningful under split-brain of 2-node
cluster. A promoted resource instance takes the base priority + 1 on
calculation if the base priority is not 0. Any delay that are introduced by
pcmk_delay_max configured for the corresponding fencing resources will be
added to this delay. A meta attribute priority=100 or alike for the
ControlZone resource is needed to make this work. See
ocf_suse_SAPCMControlZone(7).
The delay should be significantly greater than, or safely twice,
pcmk_delay_max.
EXAMPLES¶
* CRM basic configuration.
This example has been taken from a two-node cluster SLE-HA 15 SP4 with disk-based SBD. Priority fencing is configured and the SBD pcmk_delay_max has been reduced accordingly. The stonith-timeout is adjusted to SBD on-disk msgwait. The migration-threshold is set to 3. The failure-timeout is 86400.
params pcmk_delay_max=15
property cib-bootstrap-options: \
cluster-infrastructure=corosync \
placement-strategy=balanced \
dc-deadtime=20 \
stonith-enabled=true \
stonith-timeout=150 \
stonith-action=reboot \
have-watchdog=true \
priority-fencing-delay=30
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3 \
failure-timeout=86400
op_defaults op-options: \
timeout=120 \
record-pending=true
* Statically mounted NFS share for ControlZone platform data.
Below is an fstab example for a shared filesystem holding application data. The filesystem is statically mounted on all nodes of the cluster. The correct mount options are depending on the NFS server. However, client-side write caching has to be disabled in any case.
Note: The NFS share might be monitored, but not mounted/umounted by the HA cluster. See ocf_suse_SAPCMControlZone(7) for details.
* Ping cluster resource for checking connectivity.
Below is an example of an optional ping resource for checking
connectivity to the outer world. If the nodes have only one network
interface, shared between HA cluster and application, this measure may not
improve availability.
ControlZone should run on an node from which more ping targets can be reached
than from others. If all nodes are same, the ControlZone application
resource group (e.g. grp_C11) stays where it is. Three vital infrastructure
servers outside the datacenter are choosen as ping targets. If at least two
targets are reachable, the current node is preferred for running
ControlZone. The maximum time for detecting connectivity changes is ca.180
seconds.
params name=ping-okay host_list="proxy1 proxy2 proxy3" \
op monitor interval=120 timeout=60 start-delay=10 on-fail=ignore
clone cln_ping rsc_ping
location loc_connect_C11 grp_C11 \
rule 90000: ping-okay gt 1
FILES¶
- /etc/passwd
- the local user database
- /etc/groups
- the local group database
- /etc/hosts
- the local hostname resolution database
- /etc/chrony.conf
- the basic configuration for time synchronisation
- /etc/sysctl.d/*.conf
- the OS kernel parameters, e.g. TCP tunables
- /etc/fstab
- the filesystem table, for statically mounted NFS shares
- ~/.bashrc
- the mzadmin´s ~/.bashrc, defining JAVA_HOME, MZ_HOME and MZ_PLATFORM
- /usr/lib64/jvm/jre-17-openjdk/
- the Java 1.7 runtime environment shipped with the OS, potentially $JAVA_HOME
BUGS¶
In case of any problem, please use your favourite SAP support
process to open a request for the component BC-OP-LNX-SUSE.
Please report feedback and suggestions to feedback@suse.com.
SEE ALSO¶
ocf_suse_SAPCMControlZone(7), ocf_heartbeat_ping(7)
, crm(8) , passwd(5) , usermod(8) , hosts(5) ,
fstab(5) , nfs(5) , mount(8) , chrony.conf(5) ,
ha_related_suse_tids(7) , ha_related_sap_notes(7) ,
https://documentation.suse.com/sbp/sap/ ,
https://documentation.suse.com/#sle-ha ,
https://www.suse.com/support/kb/doc/?id=000019514 ,
https://www.suse.com/support/kb/doc/?id=000019722 ,
https://launchpad.support.sap.com/#/notes/1552925 ,
https://launchpad.support.sap.com/#/notes/3079845
AUTHORS¶
F.Herschel, L.Pinne
COPYRIGHT¶
(c) 2023-2024 SUSE LLC
SAPCMControlZone comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at
http://www.gnu.org/licenses/gpl.html
| 18 Mar 2024 |