Scroll to navigation

SANLOCK(8) System Manager's Manual SANLOCK(8)

NAME

sanlock - shared storage lock manager

SYNOPSIS

sanlock [COMMAND] [ACTION] ...

DESCRIPTION

sanlock is a lock manager for shared storage environments. It allows applications running on multiple hosts to coordinate access to shared resources, such as data objects on shared storage systems like a SAN, preventing data corruption and ensuring data integrity.

OPTIONS

COMMAND can be one of three primary top level choices

sanlock daemon start daemon
sanlock client send request to daemon (default command if none given)
sanlock direct access storage directly (no coordination with daemon)

Daemon Command

sanlock daemon [options]

-D no fork and print all logging to stderr

-Q 0|1 quiet error messages for common lock contention

-R 0|1 renewal debugging, log debug info for each renewal

-L pri write logging at priority level and up to logfile (-1 none)

-S pri write logging at priority level and up to syslog (-1 none)

-U uid user id

-G gid group id

-H num renewal history size

-t num max worker threads

-g sec seconds for graceful recovery

-w 0|1 use watchdog through wdmd

-o sec io timeout

-h 0|1 use high priority (RR) scheduling

-l num use mlockall (0 none, 1 current, 2 current and future)

-b sec seconds a host id bit will remain set in delta lease bitmap

-e str unique local host name used in delta leases as host_id owner

Client Command

sanlock client action [options]

sanlock client status

Print processes, lockspaces, and resources being managed by the sanlock daemon. Add -D to show extra internal daemon status for debugging. Add -o p to show resources by pid, or -o s to show resources by lockspace.

sanlock client host_status

Print state of host_id leases read during the last renewal. State of all lockspaces is shown (use -s to select one). Add -D to show extra internal daemon status for debugging.

sanlock client gets

Print lockspaces being managed by the sanlock daemon. The LOCKSPACE string will be followed by ADD or REM if the lockspace is currently being added or removed. Add -h 1 to also show hosts in each lockspace.

sanlock client renewal -s LOCKSPACE

Print a history of renewals with timing details. See the Renewal history section below.

sanlock client log_dump

Print the sanlock daemon internal debug log.

sanlock client shutdown

Ask the sanlock daemon to exit. Without the force option (-f 0), the command will be ignored if any lockspaces exist. With the force option (-f 1), any registered processes will be killed, their resource leases released, and lockspaces removed. With the wait option (-w 1), the command will wait for a result from the daemon indicating that it has shut down and is exiting, or cannot shut down because lockspaces exist (command fails).

sanlock client init -s LOCKSPACE

Tell the sanlock daemon to initialize a lockspace on disk. The -o option can be used to specify the io timeout to be written in the host_id leases. The -Z and -A options can be used to specify the sector size and align size, and both should be set together. Use -N 1 to include the NO_TIMEOUT flag in the newly formatted leases. Use -C 1 to request the use of CAW leases if supported, or -C 0 to not use CAW leases. (Also see sanlock direct init.)

sanlock client init -r RESOURCE

Tell the sanlock daemon to initialize a resource lease on disk. The -Z and -A options can be used to specify the sector size and align size, and both should be set together. Use -C 1 to request the use of CAW leases if supported, or -C 0 to not use CAW leases. (Also see sanlock direct init.)

sanlock client read -s LOCKSPACE

Tell the sanlock daemon to read a lockspace from disk. Only the LOCKSPACE path and offset are required. If host_id is zero, the first record at offset (host_id 1) is used. The complete LOCKSPACE is printed. Add -D to print other details. (Also see sanlock direct read_leader.)

sanlock client read -r RESOURCE

Tell the sanlock daemon to read a resource lease from disk. Only the RESOURCE path and offset are required. The complete RESOURCE is printed. Add -D to print other details. (Also see sanlock direct read_leader.)

sanlock client init_host -s LOCKSPACE

Tell the sanlock daemon to initialize a single host_id lease on disk. The host_id specified in the -s arg will be used, and written as the lease owner. Optionally specify host name with -e, generation with -g, and timestamp with -t. Use -Z to specify sector size. Use -N 1 to include the NO_TIMEOUT flag in the reformatted lease. Use -C 1 to request the use of CAW leases if supported, or -C 0 to not use CAW leases. (Also see sanlock direct init_host for more information.)

sanlock client add_lockspace -s LOCKSPACE

Tell the sanlock daemon to acquire the host_id lease for the host_id specified in LOCKSPACE. This is also referred to as "joining" the lockspace. With a host_id lease held for the lockspace, the host is then able to acquire resource locks in the lockspace. Use -o <sec> to specify the io timeout of the acquiring host, which will be written in the host_id lease.

sanlock client inq_lockspace -s LOCKSPACE

Inquire about the state of the lockspace in the sanlock daemon, whether it is being added or removed, or is joined.

sanlock client rem_lockspace -s LOCKSPACE

Tell the sanlock daemon to release the specified host_id in the lockspace. Any processes holding resource leases in this lockspace will be killed, and the resource leases not released.

sanlock client command -r RESOURCE -c path args

Register with the sanlock daemon, acquire the specified resource lease, and exec the command at path with args. When the command exits, the sanlock daemon will release the lease. -c must be the final option.

sanlock client spawn -r RESOURCE -c COUNT CMD [ARG...] [-c COUNT CMD [ARG...]]...

Register with the sanlock daemon, acquire the specified resource lease, fork and exec each command specified by -c sequentially as separate processes, checking the exit status of each process, and only moving to the next process on success. After all processes are successfully executed, or at the first failure, the lease is released explicitly before exiting. Use -P 1 for persistent locks that will not be dropped if the spawn process dies while a child process is running. Use -h 1 to report a conflicting lock owner. Use -O 1 to acquire an orphan lock. Use -d 1 to skip the on-disk resource lease update when releasing the resource lease after all commands complete successfully (useful when the commands have removed the lease storage.)

sanlock client acquire -r RESOURCE -p pid
sanlock client release -r RESOURCE -p pid

Tell the sanlock daemon to acquire or release the specified resource lease for the given pid. The pid must be registered with the sanlock daemon. acquire can optionally take a versioned RESOURCE string RESOURCE:lver, where lver is the version of the lease that must be acquired, or fail. Use -C in place of -p to specify client_id.

sanlock client convert -r RESOURCE -p pid

Tell the sanlock daemon to convert the mode of the specified resource lease for the given pid. If the existing mode is exclusive (default), the mode of the lease can be converted to shared with RESOURCE:SH. If the existing mode is shared, the mode of the lease can be converted to exclusive with RESOURCE (no :SH suffix). Use -C in place of -p to specify client_id.

sanlock client inquire -p pid

Print the resource leases held the given pid. The format is a versioned RESOURCE string "RESOURCE:lver" where lver is the version of the lease held. Use -C in place of -p to specify client_id.

sanlock client request -r RESOURCE -f force_mode

Request the owner of a resource do something specified by force_mode. A versioned RESOURCE:lver string must be used with a greater version than is presently held. Zero lver and force_mode clears the request.

sanlock client examine -r RESOURCE

Examine the request record for the currently held resource lease and carry out the action specified by the requested force_mode.

sanlock client examine -s LOCKSPACE

Examine requests for all resource leases currently held in the named lockspace. Only lockspace_name is used from the LOCKSPACE argument.

sanlock client set_event -s LOCKSPACE -i host_id -g gen -e num -d num

Set an event for another host. When the sanlock daemon next renews its host_id lease for the lockspace it will: set the bit for the host_id in its bitmap, and set the generation, event and data values in its own host_id lease. An application that has registered for events from this lockspace on the destination host will get the event that has been set when the destination sees the event during its next host_id lease renewal.

sanlock client set_config -s LOCKSPACE

Set a configuration value for a lockspace. Only lockspace_name is used from the LOCKSPACE argument. The USED flag has the same effect on a lockspace as a process holding a resource lease that will not exit. The USED_BY_ORPHANS flag means that an orphan resource lease will have the same effect as the USED. The -o <sec> option can be used to update the lockspace's io timeout.
-u 0|1 Set (1) or clear (0) the USED flag.
-O 0|1 Set (1) or clear (0) the USED_BY_ORPHANS flag.

sanlock client set_host -s LOCKSPACE -i host_id -g gen -F flag_name

When flag_name is DEAD_EXT, the DEAD_EXT flag is set in the host_id lease for the specified host_id. If the current host_id lease generation does not match the specified generation, then the command will fail. With DEAD_EXT set, the host_id+generation will be considered dead, and resource locks held by the specified host_id+generation will be free for other hosts to acquire. DEAD_EXT should only be set for a host if that host can no longer modify the shared resources that were protected by the resource locks in the lockspace.

Direct Command

sanlock direct action [options]

-o sec io timeout in seconds

sanlock direct init -s LOCKSPACE
sanlock direct init -r RESOURCE

Initialize storage for a lockspace or resource. Use the -Z and -A flags to specify the sector size and align size. The max hosts that can use the lockspace/resource (and the max possible host_id) is determined by the sector/align size combination. Possible combinations are: 512/1M, 4096/1M, 4096/2M, 4096/4M, 4096/8M. Lockspaces and resources both use the same amount of space (align_size) for each combination. When initializing a lockspace, sanlock initializes host_id leases (delta leases) for max_hosts in the given space. When initializing a resource, sanlock initializes a single resource lock (paxos lease) in the space. With -s, the -o option specifies the io timeout to be written in the host_id leases. With -r, the -z 1 option invalidates the resource lease on disk so it cannot be used until reinitialized normally. Use -N 1 to include NO_TIMEOUT in newly formatted lockspace host_id leases. Use -C 1 to request the use of COMPARE AND WRITE (CAW) leases if supported, or -C 0 to not use CAW leases.

sanlock direct init_host -s LOCKSPACE

Initialize a single host_id lease. The host_id specified in the -s arg will be used, and written as the lease owner (leader.owner_id). Optionally specify host name (leader.resource_name) with -e, generation number (leader.owner_generation) with -g, and timestamp (leader.timestamp) with -t (timestamp value 1 is special, and causes the current time to be written in the timestamp field. A timestamp value of 0 is means the host_id lease is free, as usual.) The -Z and -o options apply as with direct init. Use -N 1 to include NO_TIMEOUT in the reformatted host_id lease. Use -C 1 to request the use of CAW leases if supported, or -C 0 to not use CAW leases.

sanlock direct read_leader -s LOCKSPACE
sanlock direct read_leader -r RESOURCE

Read a leader record from disk and print the fields. The leader record is the single sector of a delta lease, or the first sector of a paxos lease.

sanlock direct read -s LOCKSPACE
sanlock direct read -r RESOURCE

Read a complete lockspace or resource from disk and print it.

sanlock direct dump path[:offset[:size]]

Read disk sectors and print leader records for delta or paxos leases. Add -f 1 to print the request record values for paxos leases, host_ids set in delta lease bitmaps.

LOCKSPACE option string

-s lockspace_name:host_id:path:offset

lockspace_name name of lockspace
host_id local host identifier in lockspace
path path to storage to use for leases
offset offset on path (bytes)

RESOURCE option string

-r lockspace_name:resource_name:path:offset

lockspace_name name of lockspace
resource_name name of resource
path path to storage to use leases
offset offset on path (bytes)

RESOURCE option string with suffix

-r lockspace_name:resource_name:path:offset:lver

lver leader version

-r lockspace_name:resource_name:path:offset:SH

SH indicates shared mode

Defaults

sanlock help shows the default values for the options above.

sanlock version shows the build version.

OTHER

Request/Examine

The first part of making a request for a resource is writing the request record of the resource (the sector following the leader record). To make a successful request:

  • RESOURCE:lver must be greater than the lver presently held by the other host. This implies the leader record must be read to discover the lver, prior to making a request.
  • RESOURCE:lver must be greater than or equal to the lver presently written to the request record. Two hosts may write a new request at the same time for the same lver, in which case both would succeed, but the force_mode from the last would win.
  • The force_mode must be greater than zero.
  • To unconditionally clear the request record (set both lver and force_mode to 0), make request with RESOURCE:0 and force_mode 0.

The owner of the requested resource will not know of the request unless it is explicitly told to examine its resources via the "examine" api/command, or otherwise notfied.

The second part of making a request is notifying the resource lease owner that it should examine the request records of its resource leases. The notification will cause the lease owner to automatically run the equivalent of "sanlock client examine -s LOCKSPACE" for the lockspace of the requested resource.

The notification is made using a bitmap in each host_id lease. Each bit represents each of the possible host_ids (1-2000). If host A wants to notify host B to examine its resources, A sets the bit in its own bitmap that corresponds to the host_id of B. When B next renews its host_id lease, it reads the host_id leases for all hosts and checks each bitmap to see if its own host_id has been set. It finds the bit for its own host_id set in A's bitmap, and examines its resource request records. (The bit remains set in A's bitmap for set_bitmap_seconds.)

force_mode determines the action the resource lease owner should take:

FORCE (1): kill the process holding the resource lease. When the process has exited, the resource lease will be released, and can then be acquired by anyone. The kill signal is SIGKILL (or SIGTERM if SIGKILL is restricted.)

GRACEFUL (2): run the program configured by sanlock_killpath against the process holding the resource lease. If no killpath is defined, then FORCE is used.

Persistent and orphan resource leases

A resource lease can be acquired with the PERSISTENT flag (-P 1). If the process holding the lease exits, the lease will not be released, but kept on an orphan list. Another local process can acquire an orphan lease using the ORPHAN flag (-O 1), or release the orphan lease using the ORPHAN flag (-O 1). All orphan leases can be released by setting the lockspace name (-s lockspace_name) with no resource name.

Renewal history

sanlock saves a limited history of lease renewal information in each lockspace. See sanlock.conf renewal_history_size to set the amount of history or to disable (set to 0).

IO times are measured in delta lease renewal (each delta lease renewal includes one read and one write).

For each successful renewal, a record is saved that includes:

the timestamp written in the delta lease by the renewal
the time in milliseconds taken by the delta lease read
the time in milliseconds taken by the delta lease write

Also counted and recorded are the number io timeouts and other io errors that occur between successful renewals.

Two consecutive successful renewals would be recorded as:

timestamp=5332 read_ms=482 write_ms=5525 next_timeouts=0 next_errors=0
timestamp=5353 read_ms=99 write_ms=3161 next_timeouts=0 next_errors=0

Those fields are:

timestamp is the value written into the delta lease during that renewal.

read_ms/write_ms are the milliseconds taken for the renewal read/write ios.

next_timeouts are the number of io timeouts that occurred after the renewal recorded on that line, and before the next successful renewal on the following line.

next_errors are the number of io errors (not timeouts) that occurred after renewal recorded on that line, and before the next successful renewal on the following line.

The command 'sanlock client renewal -s lockspace_name' reports the full history of renewals saved by sanlock, which by default is 180 records, about 1 hour of history when using a 20 second renewal interval for a 10 second io timeout.

Configurable watchdog timeout

Watchdog devices usually have a 60 second timeout, but some devices have a configurable timeout. To use a different watchdog timeout, set sanlock.conf watchdog_fire_timeout (in seconds) to a value supported by the device. The same watchdog_fire_timeout must be configured on all hosts (so all hosts must have watchdog devices that support the same timeout). Unmatching values will invalidate the lease protection provided by the watchdog.

watchdog_fire_timeout and io_timeout should usually be configured together. By default, sanlock uses watchdog_fire_timeout=60 with io_timeout=10. Other combinations to consider are:
watchdog_fire_timeout=30 with io_timeout=5
watchdog_fire_timeout=10 with io_timeout=2

Smaller values make it more likely that a host will be reset by the watchdog while waiting for slow io to complete or for temporary io failures to be resolved. Spurious watchdog resets will also become more likely due to independent, overlapping lockspace outages, each of which would be inconsequential by itself.

FILES

/etc/sanlock/sanlock.conf

The current settings in use by the sanlock daemon can be seen in the output of 'sanlock status -D'.

quiet_fail = 1
See -Q

debug_renew = 0
See -R

logfile_priority = 4
See -L

logfile_use_utc = 0
Use UTC instead of local time in log messages.

syslog_priority = 3
See -S

names_log_priority = 6
Log resource names at this priority level (uses syslog priority numbers). If this number less than or equal to logfile_priority, each requested resource name and location is recorded in sanlock.log.

use_watchdog = 1
See -w

high_priority = 1
See -h

mlock_level = 1
See -l

sh_retries = 8
The number of times to try acquiring a paxos lease when acquiring a shared lease when the paxos lease is held by another host acquiring a shared lease.

uname = sanlock
The sanlock daemon will attempt to use this user id. Ignored when use_compare_and_write is enabled, which requires running as root.

gname = sanlock
The sanlock daemon will attempt to use this group id. Ignored when use_compare_and_write is enabled, which requires running as root.

our_host_name = <str>
A unique name that a host uses to ensure exclusive ownership of a lockspace host_id (delta lease owner.) The maximum length is 48 characters. If no value is provided in sanlock.conf or on the command line (-e), sanlock attempts to set our_host_name from /sys/devices/virtual/dmi/id/product_uuid. If that is not available, sanlock generates a random uuid to use as our_host_name. Using a fixed our_host_name value will reduce delays when using a lockspace. Using product_uuid will reduce delays further.

renewal_read_extend_sec = <seconds>
If a renewal read i/o times out, wait this many additional seconds for that read to complete at the start of the subsequent renewal attempt. When not configured, sanlock waits for an additional io_timeout seconds for a previous timed out read to complete.

renewal_history_size = 180
See -H

paxos_debug_all = 0
Include all details in the paxos debug logging.

debug_io = <str>
Add debug logging for each i/o. "submit" (no quotes) produces debug output at submission time, "complete" produces debug output at completion time, and "submit,complete" (no space) produces both.

max_sectors_kb = <str>|<num>
Set to "ignore" (no quotes) to prevent sanlock from checking or changing max_sectors_kb for the lockspace disk when starting a lockspace. Set to "align" (no quotes) to set max_sectors_kb for the lockspace disk to the align size of the lockspace. Set to a number to set a specific number of KB for all lockspace disks. A larger existing max_sectors_kb value will not be reduced by this setting.

debug_clients = 0
Enable or disable debug logging for all client connections to the sanlock daemon.

debug_cmd = +|-<name>
Enable (+name) or disable (-name) debug logging at the command processing level for specifically named commands, e.g. "debug_cmd = +acquire", or "debug_cmd = -inq_lockspace". Repeat this line for each command name. Use a plus prefix before the name to enable and a minus prefix to disable. By default sanlock disables some command level debugging for commands that are often repetitive and fill the in memory debug buffer. This only affects debug logging, not errors or warnings, and disabling command level debugging for a command does not disable lower level debugging for that command. Special values +all and -all can be used to enable or disable all commands, and can be used before or after other debug_cmd lines.

debug_hosts = 1
Log information about other host_id lease renewals. When set to 1 (the default), messages are logged when a host_id lease is observed reaching the failed and dead states. When set to 2, messages are logged when any update (e.g. renewal) is observed for another host_id lease. When set to 0, neither are logged.

write_init_io_timeout = <seconds>
The io timeout to use when initializing ondisk lease structures for a lockspace or resource. This timeout is not used as a part of either lease algorithm (as the standard io_timeout is.)

max_worker_threads = <num>
See -t

io_timeout = <seconds>
The io timeout for disk operations, most notably delta lease renewals. This value is basis for calculating most other timeout values. (Some special cases may use a different io timeout.) Tune this value with caution, it can substantially alter the overall sanlock behavior.

watchdog_fire_timeout = <seconds>
The watchdog device timeout. The watchdog device must support the specified value. It is critical that all hosts use the same value. Not doing so will invalidate the lease protection provided by sanlock. The io_timeout should usually be tuned along with this value, e.g. watchdog_fire_timeout = 30 with io_timeout = 5.

use_hugepages = <str>
Set to "all" to use transparent hugepages (2MB via MADV_HUGEPAGE.) This should minimize, or prevent, splitting read io's on lease areas. 2MB is allocated for 1MB lease areas, causing some extra memory usage. Set to "none" to disable.

use_compare_and_write = <str>
Set to "yes" to use CAW leases when supported by storage, unless the user/API requests that non-CAW leases be used. (opt-out) Set to "no" to not use CAW leases, even if requested by the user. Set to "allow" to use CAW if supported by storage and requested by the user/API. (opt-in)

create_old_delta_disk_version = <X.Y>
Format delta leases using the specified old version number, e.g. 3.4. This allows new versions to remain compatible with old versions. Some new features will not be usable with the old version.

SEE ALSO

wdmd(8)

2026-02-27