Scroll to navigation

ocf_suse_SAPHanaFilesystem(7) OCF resource agents ocf_suse_SAPHanaFilesystem(7)

NAME

SAPHanaFilesystem - Monitors mounted SAP HANA filesystems.

SYNOPSIS

SAPHanaFilesystem [ start | stop | status | monitor | meta-data | validate-all | reload | methods | usage ]

DESCRIPTION

SAPHanaFilesystem is a resource agent (RA) that monitors mounted SAP HANA filesystems by checking read/write access. This RA does neither mount nor umount any filesystem. In case the filesystem monitor fails, the RA decides how to proceed based on HANA system replication status. The monitor and action timeouts can be significantly shorter than SAPHanaController resource timeouts. This results in faster takeover actions.

* Behaviour on HANA primary sites

srHook=SOK: In case of monitor failure the Linux cluster tries to stop and restart the SAPHanaFilesystem resource (not the real filesystem). If that stop fails and the HANA system replication is in sync the node gets fenced. In consequence an HANA sr_takeover will be triggered.

srHook=SFAIL: In case of monitor failure the Linux cluster tries to stop and restart the SAPHanaFilesystem resource (not the real filesystem). If the HANA system replication is not in sync this will be repeated until it gains success or migration-threshold is reached.

* Behaviour on HANA secondary sites

In case of monitor failure the Linux cluster is not informed by SAPHanaFilesystem resource agent.

* Background information

For HANA scale-out systems, the directory /hana/shared/$SID/ is provided as NFS share to all nodes of a site. The directory contains binaries, tools and other components needed for running and monitoring the HANA database. The NFS share for /hana/shared/$SID/ is mounted by the OS as usual. In case of NFS failure, HANA might stop working but the Linux cluster might not take action in reasonable time. Due to obligatory NFS for the directory /hana/shared/$SID/, scale-out systems are affected more often than scale-up systems. The SAPHanaFilsystem RA can be used on local filesystems as well. This might be useful for scale-up systems. Even if SAPHanaFilesystem improves Linux cluster reaction on failed filesystems, reliable filesystems are cornerstones for SAP HANA database availability.

SAPHanaFilesystem relies on cluster attributes set by SAPHanaTopology and susHanaSR.py, particularly hana_$SID_site_srHook_$SITE. See manual pages ocf_suse_SAPHanaTopology(7), susHanaSR.py(7) and SAPHanaSR-showAttr(8).

Please see also the REQUIREMENTS section below.

SUPPORTED PARAMETERS

This resource agent supports the following parameters:

SID

SAP System Identifier. Has to be same on both instances.
Mandatory. Example: "SID=SLE".

InstanceNumber

Instance Number of the SAP HANA database. For system replication also Instance Number+1 is blocked.
Mandatory. Example: "InstanceNumber=00".

DIRECTORY

Path to directory to be monitored. The RA creates data in an own subdirectory ".suse_SAPHanaFilesystem". Do not touch this hidden subdirectory.
Optional. Default: "DIRECTORY=/hana/shared/$SID/".

ON_FAIL_ACTION

Internal RA decision in case of monitor failure. Values: [ ignore | fence ].
- ignore: do nothing, just report failure into logs.
- fence: trigger stop failure and node fencing, if conditions are matched.
Optional. Default: "ON_FAIL_ACTION=fence".

SUPPORTED PROPERTIES

hana_${sid}_glob_filter

Global cluster property hana_${sid}_glob_filter . This property defines which messages are logged by the RA. It should only be set if requested by support engineers. The default is sufficient for normal operation.
Message Types: [ act | dbg | dec | flow | top ]
ACT: Action. Start, stop, sr_takeover and others. See also section SUPPORTED ACTIONS.
DBG: Debugging info. Usually not needed at customer site. See SUSE TID 7022678 for maximum RA tracing.
DEC: Decision taken by the RA.
FLOW: Function calls and the respective return codes.

SUPPORTED ACTIONS

This resource agent supports the following actions (operations):

start

Sets the status of the clone to "started". No filesystem action is done. Suggested minimum timeout: 10.

stop

Sets the status of the clone to "stopped". No filesystem action is done. Suggested minimum timeout: 20.

status

Reports whether the SAPHanaFilesystem resource (not the filesystem) is running. Suggested minimum timeout: 120.

monitor

Checks access to the path specified in parameter DIRECTORY. The check is done by creating a sub-directory and writing a file. Suggested minimum timeout: 120. Suggested interval: 120.

validate-all

Reports whether the parameters are valid. Suggested minimum timeout: 5.

meta-data

Retrieves resource agent metadata (internal use only). Suggested minimum timeout: 5.

methods

Reports which methods (operations) the resource agent supports. Suggested minimum timeout: 5.

reload

Changes parameters without forcing a recover of the resource. Suggested minimum timeout: 5.

RETURN CODES

The return codes are defined by the OCF cluster framework. Please refer to the OCF definition on the website mentioned below.
In addition the internal return code 124 is logged, if the timeout has been exceeded.

EXAMPLES

* Example configuration for a SAPHanaFilesystem resource on HANA scale-up.

Might be useful if NFS is used for the /hana/shared/ filesystem instead of classical block devices. One NFS share is used per node. The NFS is not shared across sites. On each cluster node the NFS share is mounted statically. SID is SLE, instance number is 00.

primitive rsc_SAPHanaFil_SLE_HDB00 ocf:suse:SAPHanaFilesystem \
op start interval="0" timeout="10" \
op stop interval="0" timeout="20" \
op monitor interval="120" timeout="120" \
params SID="SLE" InstanceNumber="00"

clone cln_SAPHanaFil_SLE_HDB00 rsc_SAPHanaFil_SLE_HDB00 \
meta clone-node-max="1" interleave="true"

* Example configuration for a SAPHanaFilesystem resource on HANA scale-up that does nothing.

Might be useful for logging issues with accessing the /hana/shared/ filesystem. The RA does nothing except logging monitor failures. SID is SLE, instance number is 00. See also example on showing monitor failures in system logs.

primitive rsc_SAPHanaFil_SLE_HDB00 ocf:suse:SAPHanaFilesystem \
op start interval="0" timeout="10" \
op stop interval="0" timeout="20" \
op monitor interval="120" timeout="120" \
params SID="SLE" InstanceNumber="00" ON_FAIL_ACTION="ignore"

clone cln_SAPHanaFil_SLE_HDB00 rsc_SAPHanaFil_SLE_HDB00 \
meta clone-node-max="1" interleave="true"

* Example configuration for a SAPHanaFilesystem resource for HANA scale-out.

The HANA consists of two sites with several nodes each. An additional cluster node is used as majority maker for split brain situations. One /hana/shared/ filesystem is used per site. This filesystem is provided by an NFS server and shared among all cluster nodes of that site. The NFS is not shared across sites. On each cluster node the NFS share is mounted statically. SID is SLE, instance number is 00.

primitive rsc_SAPHanaFil_SLE_HDB00 ocf:suse:SAPHanaFilesystem \
op start interval="0" timeout="10" \
op stop interval="0" timeout="20" on-fail="fence" \
op monitor interval="120" timeout="180" \
params SID="SLE" InstanceNumber="00"

clone cln_SAPHanaFil_SLE_HDB00 rsc_SAPHanaFil_SLE_HDB00 \
meta clone-node-max="1" interleave="true"

location SAPHanaFil_not_on_majority_maker cln_SAPHanaFil_SLE_HDB00 -inf: vm-majority

* Example on showing the current SAPHanaFilesystem rescource configuration on scale-out.

The primitive is "rsc_SAPHanaFil_SLE_HDB00" and clone is "cln_SAPHanaFil_SLE_HDB00". The constraints´ names are starting with "SAPHanaFil".

# crm configure show | grep SAPHanaFil_
# crm configure show rsc_SAPHanaFil_SLE_HDB00
# crm configure show cln_SAPHanaFil_SLE_HDB00
# crm configure show SAPHanaFil_not_on_majority_maker

* Search for log entries of the resource agent. Show errors only.

# grep "SAPHanaFilesystem.*RA.*rc=[1-7,9]" /var/log/messages

* Search for log entries of the resource agent. Show date, time, return code, runtime.

# grep "SAPHanaFilesystem.*end.action.monitor_clone.*rc=" /var/log/messages | awk '{print $1,$11,$13}' | colrm 20 32 | tr -d "=()rsc" | tr "T" " "

* Search for log entries of the resource agent. Show poison pill only.

# grep "SAPHanaFilesystem.*RA.*poison.pill.detected" /var/log/messages

* Search for node fence actions caused by resource stop failure.

# grep "Stop.of.failed.*is.fenced" /var/log/messages

* Show and delete failcount for resource.

Resource is rsc_SAPHanaFil_HA1_HDB00, node is node22. Useful after a failure has been fixed and for testing. See also cluster properties migration-threshold and failure-timeout.

# crm resource failcount rsc_SAPHanaFil_HA1_HDB00 show node22
# crm resource failcount rsc_SAPHanaFil_HA1_HDB00 delete node22

* Example for static NFS mount.

This is an example line in /etc/fstab. NFS server is nfs1, SID is SLE. The NFS share will be mounted at OS boot time. The shown export path and mount options need to be adjusted for the NFS server in use. See manual pages nfs(5) and fstab(5) for details.

nfs1:/export/SLE/shared/ /hana/shared/SLE/ auto defaults,rw,hard,proto=tcp,intr,noatime,vers=4,lock 0 0

* Example for temporarily blocking HANA access to local filesystems.

This could be done for testing the SAPHanaFilesystem RA integration. Blocking the HANA filesystem is dangerous. This test should not be done on production systems. SID is SLE. See also manual page fsfreeze(8).
Note: Understand the impact before trying.

1. Check HANA and Linux cluster for clean idle state.

2. On secondary, block /hana/shared/SLE/ filesystem.

# sync /hana/shared/SLE/
# fsfreeze --freeze /hana/shared/SLE/

3. Check system log for SAPHanaFilsystem entries.

4. On secondary, unblock /hana/shared/SLE/ filesystem.

# fsfreeze --unfreeze /hana/shared/SLE/

5. Check HANA and Linux cluster for clean idle state.

* Example for temporarily blocking HANA access to NFS filesystems.

This could be done for testing the SAPHanaFilesystem RA integration. Blocking the HANA filesystem is dangerous. This test should not be done on production systems. Used TCP port is 2049. See also SUSE TID 7000524.
Note: Understand the impact before trying.

1. Check HANA and Linux cluster for clean idle state.

2. On secondary, block /hana/shared/SLE/ filesystem.

# sync /hana/shared/SLE/
# iptables -I OUTPUT -p tcp -m multiport --ports 2049 -j ACCEPT
Note: The ACCEPT needs to be replaced by appropriate action.

3. Check system log for SAPHanaFilsystem entries.

4. On secondary, unblock /hana/shared/SLE/ filesystem.

# iptables -D OUTPUT -p tcp -m multiport --ports 2049 -j DROP

5. Check HANA and Linux cluster for clean idle state.

FILES

/usr/lib/ocf/resource.d/suse/SAPHanaController
the controller resource agent
/usr/lib/ocf/resource.d/suse/SAPHanaTopology
the topology resource agent
/usr/lib/ocf/resource.d/suse/SAPHanaFilesystem
the filesystem monitoring resource agent
/usr/lib/SAPHanaSR-angi/
the directory with function libraries
$DIRECTORY/
the directory to be monitored, default DIRECTORY=/hana/shared/$SID/
$DIRECTORY/.suse_SAPHanaFilesystem/
the RA´s subdirectory, do not touch this
$HA_RSCTMP/
the directory with resource status files, do not touch this
/dev/shm/poison_pill_$SID
the resource poison pill file, do not touch this
/etc/fstab
the static information about the filesystems

REQUIREMENTS

For the current version of the SAPHanaFilesystem resource agent that comes with the software package SAPHanaSR-angi, the support is limited to the scenarios and parameters described in the respective manual page SAPHanaSR-angi(7) and its references.

1. A Linux cluster STONITH method for all nodes is needed,
2. on-fail=fence is set for the stop action of SAPHanaFilesystem.
3. User root needs read/write access to the monitored directory.
4. SAPHanaTopology is working.
5. Each site has its own filesystems. The filesystems are not shared across sites.
6. SAP HANA host auto-failover is currently not supported.
7. For HANA scale-out, the SAPHanaSR-alert-fencing should be configured. See manual page SAPHanaSR-alert-fencing(8) for details.

BUGS

In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.

SEE ALSO

ocf_suse_SAPHanaController(7) , ocf_suse_SAPHanaTopology(7) , susHanaSR.py(7) , SAPHanaSR-showAttr(8) , SAPHanaSR-alert-fencing(8) , SAPHanaSR-angi(7) , SAPHanaSR(7) , SAPHanaSR-ScaleOut(7) , fstab(5) , mount(8) , nfs(5) ,
https://documentation.suse.com/sbp/sap/ ,
https://www.suse.com/support/kb/doc/?id=000019904 ,
https://www.suse.com/support/kb/doc/?id=000016649

AUTHORS

F.Herschel, L.Pinne.

COPYRIGHT

(c) 2023-2024 SUSE LLC
SAPHanaFilesystem comes with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at http://www.gnu.org/licenses/gpl.html

24 Jun 2024