Scroll to navigation

BOM(1) General Commands Manual BOM(1)

NAME

bomDecode Unicode byte order mark

SYNOPSIS

bom --strip [--expect types] [--lenient] [--prefer32] [--utf8] [file]

bom --detect [--expect types] [--prefer32] [file]

bom --print type

bom --list

bom --help

bom --version

DESCRIPTION

bom decodes, verifies, reports, and/or strips the byte order mark (BOM) at the start of the specified file, if any.

When no file is specified, or when file is -, read standard input.

OPTIONS

, --detect
Report the detected BOM type to standard output and then exit.

See SUPPORTED BOM TYPES for possible values.

, --expect types
Expect to find one of the specified BOM types, otherwise exit with an error.

Multiple types may be specified, separated by commas.

Specifying NONE is acceptable and matches when the file has no (supported) BOM.

, --help
Output command line usage help.
, --lenient
Silently ignore any illegal byte sequences encountered when converting the remainder of the file to UTF-8.

Without this flag, bom will exit immediately with an error if an illegal byte sequence is encountered.

This flag has no effect unless the --utf8 flag is given.

List the supported BOM types and exit.
, --print type
Output the byte sequence corresponding to the type byte order mark.
Used to disambiguate the byte sequence FF FE 00 00, which can be either a UTF-32LE BOM or a UTF-16LE BOM followed by a NUL character.

Without this flag, UTF-16LE is assumed; with this flag, UTF-32LE is assumed.

, --strip
Strip the BOM, if any, from the beginning of the file and output the remainder of the file.
, --utf8
Convert the remainder of the file to UTF-8, assuming the character encoding implied by the detected BOM.

For files with no (supported) BOM, this flag has no effect and the remainder of the file is copied unmodified.

For files with a UTF-8 BOM, the identity transformation is still applied, so (for example) illegal byte sequences will be detected.

, --version
Output program version and exit.

SUPPORTED BOM TYPES

The supported BOM types are:

NONE
No supported BOM was detected.
UTF-7
A UTF-7 BOM was detected.
UTF-8
A UTF-8 BOM was detected.
UTF-16BE
A UTF-16 (Big Endian) BOM was detected.
UTF-16LE
A UTF-16 (Little Endian) BOM was detected.
UTF-32BE
A UTF-32 (Big Endian) BOM was detected.
UTF-32LE
A UTF-32 (Little Endian) BOM was detected.
GB18030
A GB18030 (Chinese National Standard) BOM was detected.

EXAMPLES

To tell what kind of byte order mark a file has:

$ bom --detect file

To normalize files with byte order marks into UTF-8, and pass other files through unchanged:

$ bom --strip --utf8 file

Same as previous example, but discard illegal byte sequences instead of generating an error:

$ bom --strip --utf8 --lenient file

To verify a properly encoded UTF-8 or UTF-16 file with a byte-order-mark and output it as UTF-8:

$ bom --strip --utf8 --expect UTF-8,UTF-16LE,UTF-16BE file

To just remove any byte order mark and get on with your life:

$ bom --strip file

RETURN VALUES

bom exits with one of the following values:

0
Success.
1
A general error occurred.
2
The --expect flag was given but the detected BOM did not match.
3
An illegal byte sequence was detected (and --lenient was not specified).

SEE ALSO

iconv(1)

bom: Decode Unicode byte order mark, https://github.com/archiecobbs/bom.

Byte order mark (Wikipedia), https://en.wikipedia.org/wiki/Byte_order_mark.

AUTHOR

Archie L. Cobbs ⟨archie.cobbs@gmail.com⟩

October 14, 2021 Linux 5.14.21-150500.55.52-default