table of contents
SAPHanaSR_maintenance_examples(7) | SAPHanaSR | SAPHanaSR_maintenance_examples(7) |
NAME¶
SAPHanaSR_maintenance_examples - maintenance examples for SAPHanaController.
DESCRIPTION¶
Maintenance examples for SAPHanaController. Please see ocf_suse_SAPHanaController(7), susHanaSR.py(7), susHanaSrMultiTarget.py(7), susTkOver.py(7), susChkSrv.py(7), SAPHanaSR-manageAttr(8), for more examples and read the REQUIREMENTS section below.
EXAMPLES¶
* Check status of Linux cluster and HANA system replication pair.
This steps should be performed before doing anything with the cluster, and after something has been done. See also cs_show_saphanasr_status(8) and section REQUIREMENTS below.
# crm_mon -1r
# crm configure show | grep cli-
# SAPHanaSR-showAttr
# cs_clusterstate -i
* Watch status of HANA cluster resources and system replication.
This might be convenient when performing administrative actions or cluster tests. It does not replace the afore mentioned checks. See also cs_show_saphanasr_status(8).
* Overview on stopping the HANA database at one site.
This procedure does work for scale-up and scale-out. No takeover
will be done. This procedure should be used, when it is necessary to stop
the HANA database. Stopping the HANA database should not be done by just
stopping the Linux cluster or shutting down the OS. This particularly
applies to scale-out systems. It might be good to define upfront which HANA
site needs to be stopped. In case both sites need to be stopped, it might be
good to define the order. First stopping the primary should keep system
replication in sync.
How long a stop will take, depends on database size, performance of underlying
infrastructure, SAP HANA version and configuration. Please refer to SAP HANA
documentation for details on tuning and stopping an HANA database.
2. Setting SAPHana or SAPHanaController multi-state resource into maintenance.
3. Stopping HANA database at the given site by using "sapcontrol -nr <nr> -function StopSystem".
4. Checking that HANA is stopped.
Note: Do not forget to end the resource maintenance after you have re-started the HANA database.
* Initiate an administrative takeover of the HANA primary from one node to the other by using the Linux cluster.
This procedure does not work for scale-out. On scale-up, it will
stop the HANA primary. This might take a while. If you want to avoid waiting
for the stopped primary, use the below procedure which suspends the primary.
If the cluster should also register the former primary as secondary,
AUTOMATED_REGISTER="true" is needed. Before the takeover will be
initiated, the status of the Linux cluster and the HANA system replication
has to be checked. The takeover should be initiated as forced migration of
the multi-state SAPHanaController resource.
Not working: Regular migration, migration of IP address, migration of
primitive SAPHanaController resource, setting primary node standby.
After takeover of the primary has been finished, the migration rule has to be
deleted. If AUTOMATED_REGISTER="true" is set, finally the former
primary will be registered as secondary, once the migration rule has been
deleted.
# SAPHanaSR-showAttr
# crm configure show | grep cli-
# cs_clusterstate -i
# crm resource move mst_SAPHanaCon_SLE_HDB10 force
# cs_clusterstate -i
# SAPHanaSR-showAttr
# crm resource clear mst_SAPHanaCon_SLE_HDB10
# SAPHanaSR-showAttr
# cs_clusterstate -i
Note: Former versions of the Linux cluster used "migrate" instead of "move" and "unmigrate" instead of "clear".
* Perform an SAP HANA takeover by using SAP tools.
The procedure is described here for scale-out. It works for scale-up as well. The procedure will stop the HANA primary. This might take a while. If you want to avoid waiting for the stopped primary, use the below procedure which suspends the primary. The status of HANA databases, system replication and Linux cluster has to be checked. The SAP HANA resources are set into maintenance, an sr_takeover is performed, the old primary is registered as new secondary. Therefor the correct secondary site name has to be used, see later example. Finally the SAP HANA resources are given back to the Linux cluster. See also section REQUIREMENTS below and later example on determining the correct site name.
# SAPHanaSR-showAttr
# crm configure show | grep cli-
# cs_clusterstate -i
If everything looks fine, proceed.
# crm resource maintenance mst_SAPHanaCon_SLE_HDB10
# crm_mon -1r
~> sapcontrol -nr 10 -function StopSystem HDB
~> sapcontrol -nr 10 -function GetSystemInstanceList
~> hdbnsutil -sr_takeover
~> cdpy; python3 ./systemReplicationStatus.py; echo RC:$?
~> cdpy; python3 ./landscapeHostConfiguration.py; echo RC:$?
If everything looks fine, proceed.
~> sapcontrol -nr 10 -function StartSystem HDB
~> exit
~> cdpy; python3 ./systemReplicationStatus.py; echo RC:$?
~> cdpy; python3 ./landscapeHostConfiguration.py; echo RC:$?
~> exit
If everything looks fine, proceed.
# cs_clusterstate -i
# crm resource refresh mst_SAPHanaCon_SLE_HDB10
# crm resource maintenance mst_SAPHanaCon_SLE_HDB10 off
# SAPHanaSR-showAttr
# crm_mon -1r
# cs_clusterstate -i
* Overview on SAP HANA takeover using SAP tools and suspend primary feature.
The procedure works for scale-up and scale-out. The status of HANA databases, system replication and Linux cluster has to be checked. The SAP HANA resources are set into maintenance, an sr_takeover is performed with suspending the primary, the old primary is registered as new secondary. Therefor the correct secondary site name has to be used. Finally the SAP HANA resources are given back to the Linux cluster. See also section REQUIREMENTS below and later example on determining the correct site name.
2. Set SAPHanaController multi-state resource into maintenance.
3. Perform the takeover, make sure to use the suspend primary feature:
5. Stop suspended old primary.
6. Register old primary as new secondary, make sure to use the correct site name.
7. Start the new secondary.
8. Check new secondary and its system replication.
9. Refresh SAPHanaController multi-state resource.
10. Set SAPHanaController multi-state resource to managed.
11. Finally check status of Linux cluster and HANA.
* Check the two site names that are known to the Linux cluster.
This is useful in case AUTOMATED_REGISTER is not yet set. In that
case a former primary needs to be registered manually with the former site
name as new secondary. The point is finding the site name that already is in
use by the Linux cluster. That exact site name has to be used for
registration of the new secondary. See also REQUIREMENTS of SAPHanaSR(7) and
SAPHanaSR-ScaleOut(7).
In this example, node is suse11 on the future secondary site to be registered.
Remote HANA master nameserver is suse21 on current primary site.
Lowercase-SID is ha1.
# crm configure show SAPHanaSR | grep hana_ha1_site_mns
# ssh suse21
# su - ha1adm -c "hdbnsutil -sr_state; echo rc: $?"
# exit
* Manually start the HANA primary if only one site is available.
This might be necessary in case the cluster can not detect the status of both sites. This is an advanced task.
Before doing this, make sure HANA is not primary on the other site!
2. Wait and check for cluster is running, and in status idle.
3. Become sidadm, and start HANA manually.
4. Wait and check for HANA is running.
5. In case the cluster does not promote the HANA to primary, instruct the cluster to migrate the IP address to that node.
6. Wait and check for HANA has been promoted to primary by the cluster.
7. Remove the migration rule from the IP address.
8. Check if cluster is in status idle.
9. You are done, for now.
10. Please bring back the other node and register that HANA as soon as possible. If the HANA primary stays alone for too long, the log area will fill up.
* Start Linux cluster after node has been fenced.
It is recommended to not configure the Linux cluster for always starting autmatically on boot. Better is to start automatically only, if cluster and/or node have been stopped cleanly. If the node has been rebooted by STONITH, the cluster should not start automatically. If the cluster is configure that way, some steps are needed to start the cluster after a node has been rebooted by STONITH. STONITH via SBD is used in this example.
# cs_show_sbd_devices
# crm cluster start
# crm_mon -r
* Overview on maintenance procedure for Linux, HANA remains running, on pacemaker-2.0.
It is necessary to wait for each step to complete and to check the result. It also is necessary to test and document the whole procedure before applying in production. See also section REQUIREMENTS below and example on checking status of HANA and cluster above.
2. Set HANA multistate resource into maintenance mode.
5. Perform Linux maintenance.
6. Start Linux cluster on all nodes. Make sure to do that on all nodes.
* Overview on simple procedure for stopping and temporarily disabling the Linux cluster, HANA gets fully stopped.
This procedure can be used to update HANA, OS or hardware. HANA roles and resource status remains unchanged. It is necessary to wait for each step to complete and to check the result. It also is necessary to test and document the whole procedure before applying in production.
2. disabling pacemaker on HANA secondary
3. stopping cluster on HANA secondary
- system replication goes SFAIL
6. enabling pacemaker on HANA primary
7. enabling pacemaker on HANA secondary
8. starting cluster on HANA primary
- system replication recovers to SOK
Note: HANA is not available from step 4 to step 9.
* Overview on update procedure for the SAPHanaSR-angi package.
This procedure can be used to update RAs, HANA HADR provider hook scripts and related tools while HANA and Linux cluster stay online. See also SAPHanaSR-manageAttr(8) for details on reloading the HANA HADR provider.
2. Set resources SAPHanaController and SAPHanaTopology to maintenance.
3. Update RPM on all cluster nodes.
4. Reload HANA HADR provider hook script on both sites.
5. Refresh resources SAPHanaController and SAPHanaTopology.
6. Set resources SAPHanaController and SAPHanaTopology from maintenance to managed.
7. Check status of Linux cluster and HANA, see above.
* Remove left-over maintenance attribute from overall Linux cluster.
This could be done to avoid confusion caused by different maintenance procedures. See above overview on maintenance procedures with running Linux cluster. Before doing so, check for cluster attribute maintenance-mode="false".
# crm_attribute --query -t crm_config -n maintenance-mode
# crm_attribute --delete -t crm_config -n maintenance-mode
# SAPHanaSR-showAttr
* Remove left-over standby attribute from Linux cluster nodes.
This could be done to avoid confusion caused by different maintenance procedures. See above overview on maintenance procedures with running Linux cluster. Before doing so for all nodes, check for node attribute standby="off" on all nodes.
# crm_attribute --query -t nodes -N node1 -n standby
# crm_attribute --delete -t nodes -N node1 -n standby
# SAPHanaSR-showAttr
* Remove left-over maintenance attribute from resource.
This should usually not be needed. See above overview on maintenance procedures with running Linux cluster.
# crm_resource --resource cln_SAPHanaTop_HA1_HDB00 --delete-parameter maintenance --meta
# SAPHanaSR-showAttr
* Manually update global site attribute.
In rare cases the global site attribute hana_<sid>_glob_prim or hana_<sid>_glob_sec is not updated automatically after successful takeover, while all other attributes are updated correctly. The global site attribute stays outdated even after the Linux cluster has been idle for a while. In this case, that site attribute could be updated manually. Make sure everything else is fine and just the global site attribute has not been updated. Updating hana_<sid>_glob_sec for SID HA1 with site name VOLKACH:
# crm_attribute --type crm_config --set-name SAPHanaSR --name hana_ha1_glob_sec --update VOLKACH
# crm configure show SAPHanaSR
* Upgrade scale-out srHook attribute from old-style to multi-target.
As final result of this upgrade, the RAs and hook script are
upgraded from old-style to multi-target. Further the Linux cluster's
old-style global srHook attribute hana_${sid}_glob_srHook is replaced by
site-aware attributes hana_${sid}_site_srHook_${SITE}. New auxiliary
attributes are introduced. The complete procedure and related requirements
are described in detail in manual page SAPHanaSR-manageAttr(8).
The procedure at a glance:
b. Set Linux cluster resources SAPHanaController and SAPHanaTopology into maintenance.
c. Install multi-target aware SAPHanaSR-ScaleOut package on all nodes.
d. Adapt sudoers permission on all nodes.
e. Replace HANA HADR provider configuration on both sites.
f. Reload HANA HADR provider hook script on both sites.
g. Check Linux cluster and HANA HADR provider for matching defined upgrade entry state.
h. Migrate srHook attribute from old-style to multi-target.
i. Check Linux cluster for matching defined upgrade target state.
j. Set Linux cluster resources SAPHanaController and SAPHanaTopology from maintenance to managed.
k. Optionally connect third HANA site via system replication outside of the Linux cluster.
l. Finally check if everything looks fine.
FILES¶
REQUIREMENTS¶
* For the current version of the resource agents that come with the software packages SAPHanaSR-angi, the support is limited to the scenarios and parameters described in the respective manual pages SAPHanaSR-angi(7), SAPHanaSR(7) and SAPHanaSR-ScaleOut(7).
* Be patient. For detecting the overall HANA status, the Linux cluster needs a certain amount of time, depending on the HANA and the configured intervals and timeouts.
* Before doing anything, always check for the Linux cluster's idle status, left-over migration constraints, and resource failures as well as the HANA landscape status, and the HANA SR status.
* Maintenance attributes for cluster, nodes and resources must not be mixed.
* The Linux cluster needs to be up and running to allow HA/DR provider events being written into CIB attributes. The current HANA SR status might differ from CIB srHook attribute after Linux cluster maintenance.
* Manually activating an HANA primary, like start of HANA primary or takeover outside the Linux cluster creates risk of a duplicate-primary situation. The user is responsible for data integrity, particularly when activating an HANA primary. See also susTkOver.py(7).
* When manually disabling or unregistering HANA system replication that is controlled by the Linux cluster, the SAPHanaController resource needs to be in maintenance mode. The user is responsible for data integrity.
* HANA site names are discovered automatically when the RAs are activated the very first time. That exact site names have to be used later for all manual tasks.
* Just shutting down the cluster or OS while HANA is running is not a valid maintenance procedure. This is known to yield undesired results, particularly in scale-out clusters.
BUGS¶
In case of any problem, please use your favourite SAP support process to open a request for the component BC-OP-LNX-SUSE. Please report any other feedback and suggestions to feedback@suse.com.
SEE ALSO¶
ocf_suse_SAPHanaTopology(7) ,
ocf_suse_SAPHanaController(7) , susHanaSR.py(7) ,
susHanaSrMultiTarget.py(7) , susCostOpt.py(7) ,
susTkOver.py(7) , susChkSrv.py(7) ,
SAPHanaSR-showAttr(8) , SAPHanaSR(7) ,
SAPHanaSR-ScaleOut(7) , SAPHanaSR-manageAttr(8) ,
SAPHanaSR-manageProvider(8) , cs_clusterstate(8) ,
cs_show_saphanasr_status(8) , cs_wait_for_idle(8) ,
crm(8) , crm_simulate(8) , crm_mon(8) ,
crm_attribute(8) ,
https://documentation.suse.com/sbp/sap/ ,
https://www.suse.com/support/kb/doc/?id=000019253 ,
https://www.suse.com/support/kb/doc/?id=000019207 ,
https://www.suse.com/support/kb/doc/?id=000019142 ,
https://www.suse.com/c/how-to-upgrade-your-suse-sap-hana-cluster-in-an-easy-way/
,
https://www.suse.com/c/tag/towardszerodowntime/ ,
https://help.sap.com/doc/eb75509ab0fd1014a2c6ba9b6d252832/1.0.12/en-US/SAP_HANA_Administration_Guide_en.pdf
AUTHORS¶
F.Herschel, L.Pinne.
COPYRIGHT¶
(c) 2017-2018 SUSE Linux GmbH, Germany.
(c) 2019-2025 SUSE LLC
This maintenance examples are coming with ABSOLUTELY NO WARRANTY.
For details see the GNU General Public License at
http://www.gnu.org/licenses/gpl.html
10 Jan 2025 |