BOM(1)

General Commands Manual

BOM(1)

NAME¶

bom — Decode Unicode byte order mark

SYNOPSIS¶

bom --strip [--expect types] [--lenient] [--prefer32] [--utf8] [file]

bom --detect [--expect types] [--prefer32] [file]

bom --print type

bom --list

bom --help

bom --version

DESCRIPTION¶

bom decodes, verifies, reports, and/or strips the byte order mark (BOM) at the start of the specified file, if any.

When no file is specified, or when file is -, read standard input.

OPTIONS¶

-d, --detect

Report the detected BOM type to standard output and then exit.

See SUPPORTED BOM TYPES for possible values.

-e, --expect types

Expect to find one of the specified BOM types, otherwise exit with an error.

Multiple types may be specified, separated by commas.

Specifying NONE is acceptable and matches when the file has no (supported) BOM.

-h, --help

Output command line usage help.

-l, --lenient

Silently ignore any illegal byte sequences encountered when converting the remainder of the file to UTF-8.

Without this flag, bom will exit immediately with an error if an illegal byte sequence is encountered.

This flag has no effect unless the --utf8 flag is given.

--list

List the supported BOM types and exit.

-p, --print type

Output the byte sequence corresponding to the type byte order mark.

--prefer32

Used to disambiguate the byte sequence FF FE 00 00, which can be either a UTF-32LE BOM or a UTF-16LE BOM followed by a NUL character.

Without this flag, UTF-16LE is assumed; with this flag, UTF-32LE is assumed.

-s, --strip

Strip the BOM, if any, from the beginning of the file and output the remainder of the file.

-u, --utf8

Convert the remainder of the file to UTF-8, assuming the character encoding implied by the detected BOM.

For files with no (supported) BOM, this flag has no effect and the remainder of the file is copied unmodified.

For files with a UTF-8 BOM, the identity transformation is still applied, so (for example) illegal byte sequences will be detected.

-v, --version

Output program version and exit.

SUPPORTED BOM TYPES¶

The supported BOM types are:

NONE: No supported BOM was detected.
UTF-7: A UTF-7 BOM was detected.
UTF-8: A UTF-8 BOM was detected.
UTF-16BE: A UTF-16 (Big Endian) BOM was detected.
UTF-16LE: A UTF-16 (Little Endian) BOM was detected.
UTF-32BE: A UTF-32 (Big Endian) BOM was detected.
UTF-32LE: A UTF-32 (Little Endian) BOM was detected.
GB18030: A GB18030 (Chinese National Standard) BOM was detected.

EXAMPLES¶

To tell what kind of byte order mark a file has:

$ bom --detect file

To normalize files with byte order marks into UTF-8, and pass other files through unchanged:

$ bom --strip --utf8 file

Same as previous example, but discard illegal byte sequences instead of generating an error:

$ bom --strip --utf8 --lenient file

To verify a properly encoded UTF-8 or UTF-16 file with a byte-order-mark and output it as UTF-8:

$ bom --strip --utf8 --expect UTF-8,UTF-16LE,UTF-16BE file

To just remove any byte order mark and get on with your life:

$ bom --strip file

RETURN VALUES¶

bom exits with one of the following values:

0: Success.
1: A general error occurred.
2: The --expect flag was given but the detected BOM did not match.
3: An illegal byte sequence was detected (and --lenient was not specified).

AUTHOR¶

Archie L. Cobbs ⟨archie.cobbs@gmail.com⟩

October 14, 2021

Linux 5.14.21-150500.55.52-default

Source file:	bom.1.en.gz (from bom 1.0.1-1.11)
Source last updated:	2021-11-09T19:48:37Z
Converted to HTML:	2024-05-04T01:25:53Z