table of contents
BOM(1) | General Commands Manual | BOM(1) |
NAME¶
bom
— Decode
Unicode byte order mark
SYNOPSIS¶
bom |
--strip [--expect
types] [--lenient ]
[--prefer32 ] [--utf8 ]
[file] |
bom |
--detect [--expect
types] [--prefer32 ]
[file] |
bom |
--print type |
bom |
--list |
bom |
--help |
bom |
--version |
DESCRIPTION¶
bom
decodes, verifies, reports, and/or
strips the byte order mark (BOM) at the start of the specified file, if
any.
When no file is specified, or when file is -, read standard input.
OPTIONS¶
-d
,--detect
- Report the detected BOM type to standard output and then exit.
See SUPPORTED BOM TYPES for possible values.
-e
,--expect
types- Expect to find one of the specified BOM types, otherwise exit with an
error.
Multiple types may be specified, separated by commas.
Specifying NONE is acceptable and matches when the file has no (supported) BOM.
-h
,--help
- Output command line usage help.
-l
,--lenient
- Silently ignore any illegal byte sequences encountered when converting the
remainder of the file to UTF-8.
Without this flag,
bom
will exit immediately with an error if an illegal byte sequence is encountered.This flag has no effect unless the
--utf8
flag is given. --list
- List the supported BOM types and exit.
-p
,--print
type- Output the byte sequence corresponding to the type byte order mark.
--prefer32
- Used to disambiguate the byte sequence FF FE 00 00,
which can be either a UTF-32LE BOM or a
UTF-16LE BOM followed by a NUL character.
Without this flag, UTF-16LE is assumed; with this flag, UTF-32LE is assumed.
-s
,--strip
- Strip the BOM, if any, from the beginning of the file and output the remainder of the file.
-u
,--utf8
- Convert the remainder of the file to UTF-8, assuming the character
encoding implied by the detected BOM.
For files with no (supported) BOM, this flag has no effect and the remainder of the file is copied unmodified.
For files with a UTF-8 BOM, the identity transformation is still applied, so (for example) illegal byte sequences will be detected.
-v
,--version
- Output program version and exit.
SUPPORTED BOM TYPES¶
The supported BOM types are:
- NONE
- No supported BOM was detected.
- UTF-7
- A UTF-7 BOM was detected.
- UTF-8
- A UTF-8 BOM was detected.
- UTF-16BE
- A UTF-16 (Big Endian) BOM was detected.
- UTF-16LE
- A UTF-16 (Little Endian) BOM was detected.
- UTF-32BE
- A UTF-32 (Big Endian) BOM was detected.
- UTF-32LE
- A UTF-32 (Little Endian) BOM was detected.
- GB18030
- A GB18030 (Chinese National Standard) BOM was detected.
EXAMPLES¶
To tell what kind of byte order mark a file has:
$ bom --detect file
To normalize files with byte order marks into UTF-8, and pass other files through unchanged:
$ bom --strip --utf8 file
Same as previous example, but discard illegal byte sequences instead of generating an error:
$ bom --strip --utf8 --lenient file
To verify a properly encoded UTF-8 or UTF-16 file with a byte-order-mark and output it as UTF-8:
$ bom --strip --utf8 --expect UTF-8,UTF-16LE,UTF-16BE file
To just remove any byte order mark and get on with your life:
$ bom --strip file
RETURN VALUES¶
bom
exits with one of the following
values:
- 0
- Success.
- 1
- A general error occurred.
- 2
- The
--expect
flag was given but the detected BOM did not match. - 3
- An illegal byte sequence was detected (and
--lenient
was not specified).
SEE ALSO¶
bom: Decode Unicode byte order mark, https://github.com/archiecobbs/bom.
Byte order mark (Wikipedia), https://en.wikipedia.org/wiki/Byte_order_mark.
AUTHOR¶
Archie L. Cobbs ⟨archie.cobbs@gmail.com⟩
October 14, 2021 | Linux 6.13.6-1-default |