- Tumbleweed 1.3.4-1.1
- Leap-16.0
- Leap-15.6
| CSVPRINTF(1) | General Commands Manual | CSVPRINTF(1) |
NAME¶
csvprintf — CSV
file parser
SYNOPSIS¶
csvprintf |
[options] format |
csvprintf |
-b [options] |
csvprintf |
-j [options] |
csvprintf |
-x [options] |
csvprintf |
-X [options] |
xml2csv |
[file.xml] |
DESCRIPTION¶
csvprintf is a simple UNIX command line
utility for parsing CSV files.
In the first form, csvprintf works like
the printf(1) command line utility: you supply a
printf(1) format string on the command line, and each row
of the CSV file is split into arguments and formatted accordingly.
Normal Mode¶
In the first form shown above, the format specifiers in the format string specify which which CSV column to format. The column accessor within each format specifier takes one of three forms: numeric, symbolic, or column name.
A numeric column accessor is a sequence of decimal digits followed by the $ character (the same accessor format supported by printf(1)). So for example, "%3$-10d" would format the third CSV column as a decimal value, left justified, in a 10 column wide field. Columns are numbered starting at one; the "%0$d" specifier will print the number of columns in the record.
When the -i or -n
flag is given, the first row is assumed to contain column names and is not
output. This flag also enables support for symbolic and column name column
accessors.
A symbolic column accessor uses the column's name, as specified in the first row, enclosed in curly braces. For example, if the first row is FirstName,Lastname,IdNum then the format string "%{IdNum}04d: %{LastName}s, %{FirstName}s" would be equivalent to the format string "%3$04d: %2$s, %1$s".
Specifying a column name that does not appear in exactly one column in the first row generates an error, so the use of symbolic column accessors adds an extra consistency check.
A column name column accessor just outputs the name of the column; the row's value in that column is ignored. This permits column names in the output even when they are not known ahead of time. A column name column accessor is specified by putting the column's numeric index in square brackets. For example, the following format string would output the first three columns in each row, labeled by their names, on three lines:
"%[1]s: %1$s\n%[2]s: %2$s\n%[3]s: %3$s\n".
If the specified column index is zero or greater than the number of columns in the first row, an error occurs.
XML Mode¶
With -x, the entire file is converted into
an XML document.
The document element is <csv>.
Each CSV row becomes a <row> element containing its individual column values as sub-elements.
The column value sub-elements are named
<col1>, <col2>,
etc.; with -i, the sub-elements use the column names
read from the first row (with illegal characters replaced by
underscores).
Specifying -X is equivalent to specifying
both -x and -i.
In XML mode, a character encoding must be assumed; see
-e.
The xml2csv command can convert XML
documents generated by csvprintf -x back into
CSV.
JSON Mode¶
With -j, each row is converted into a JSON
document.
This form is described by RFC 7464 and consists of concatenated
JSON documents framed by ASCII RS and LF control characters, which is
compatible with the jq(1) utility's
--seq flag.
Normally each row is written as a string array; with
-i, each row is written as an object, using column
names for fields. An error occurs if two columns to be output have the same
name. Use the -c flag to whitelist the set of fields
that are output.
In JSON mode, a character encoding must be assumed; see
-e.
Bash Mode¶
With -b, each row is converted into
bash(1) variable assignment(s) which may be applied with
the eval(1) command.
Normally the output just assigns ROW as an array of values. The resulting output can be used like this:
cat input.csv | csvprintf -b | while read -r LINE; do
eval "${LINE}"
echo "The first column is: ${ROW[0]}"
echo "The second column is: ${ROW[1]}"
...
done
With -i, each column value is assigned to
a separate variable whose name is the corresponding column name (with
underscores replacing non-alphanumeric characters), and an error occurs if
two variables have the same name.
So an input file like this:
"Last Name","First Name","Registered???" "Washington","George","Y" "Lincoln","Abe","N"
can be processed like this:
cat input.csv | csvprintf -bi -p ROW_ | while read -r LINE; do
eval "${LINE}"
echo "First name: ${ROW_First_Name}"
echo "Last name: ${ROW_Last_Name}"
echo "Registered: ${ROW_Registered___}"
done
Bash Mode Security Concerns¶
There are two security issues to be aware of when using Bash Mode.
First, the -i flag opens a security hole
because Bash has several special variables like PATH,
TMPDIR, etc., which could be overwritten by malicious
input. To prevent this, csvprintf omits known Bash
variables, but for tighter security use the -c flag
to explicitly white-list the variables you need. In addition, use of the
-p flag is always recommended in Bash Mode to help
avoid namespace collisions.
Secondly, if the Bash Mode output is piped into
while read then the -r flag
must be used to prevent extraneous decoding of backslash escapes.
Input Encoding¶
In all modes, lines must be terminated by LF bytes or CR+LF byte pairs, and the separator and quote characters must be recognizable as single byte values. This parsing behavior is compatible with ASCII, ISO-8859-1, UTF-8, etc., but not multi-byte encodings such as UTF-16, which must be re-encoded (e.g., to UTF-8) first.
In normal and Bash modes, column values are copied from input to output bytewise without interpretation.
In XML and JSON modes, column values must be interpreted according
to an assumed character encoding. This encoding defaults to ISO-8859-1 but
can be changed with the -e flag.
OPTIONS¶
-b- Convert each CSV row into a bash(1) variable assignment line.
-ccolname- Specify a column to be included when using column names in XML, JSON, or
Bash output.
Without this flag, all columns are included. When this flag is used one or more times, only the specified columns are included.
If any colname doesn't exist, an error occurs.
Use of this flag requires the
-i,-n, or-Xflag. -e- Specify input character encoding for XML or JSON mode.
By default, ISO-8859-1 is assumed.
-f- Read CSV input from the specified file.
By default (or if ``-'' is specified),
csvprintfreads from standard input. -i- Use column names read from the first record in the output.
In normal mode, or when used with the
-xflag, this flag is equivalent to-n.In JSON mode, output objects instead of arrays and use column names for the object fields.
In Bash mode, output a variable for each column instead of a single ROW array variable.
It's possible for a row to have more columns than the column header row did. In that case,
csvprintfreverts to using col1, col2, etc., for any extra columns.This flag implies
-n. -j- Convert the input into a JavaScript Object Notation (JSON) text sequence document.
-n- Assume the first CSV record contains column names and omit from the
output.
In normal mode, enable symbolic and column name accessors.
-p- Specify a common prefix (UTF-8 encoding) to use with all column names in
the output.
This flag is ignored unless
-iis specified. -q- Specify an alternate CSV column quote character. The usual backslash
escape sequences are accepted.
The default quote character is double quote.
-s- Specify an alternate CSV column separator character. The usual backslash
escape sequences are accepted.
The default separator character is comma.
-h- Output usage message and exit.
-v- Output version information and exit.
-x- Convert the input into an XML document.
-X- Convert the input into an XML document using column names for value
sub-elements.
This flag implies
-n.
CSV FORMAT¶
csvprintf parses according to the format
described by ``The Comma Separated Value (CSV) File Format'' (see below). In
particular, quote characters must be escaped with an extra quote and
whitespace surrounding column values is ignored.
EXIT STATUS¶
csvprintf will exit with a status 1 if
invalid CSV input is detected. Otherwise, if an invocation of
printf(1) fails, processing stops and that exit value is
returned.
FILES¶
- /usr/share/csvprintf/csv.xsl
- XSL transform that converts XML back into CSV format.
BUGS¶
Under the hood, csvprintf invokes the
printf(1) executable on each CSV row it parses, which
makes it relatively slow.
SEE ALSO¶
csvprintf: Simple CSV file parser for the UNIX command line, https://github.com/archiecobbs/csvprintf.
The Comma Separated Value (CSV) File Format, http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm.
RFC 7464: JavaScript Object Notation (JSON) Text Sequences, https://datatracker.ietf.org/doc/html/rfc7464.
AUTHOR¶
Archie L. Cobbs ⟨archie.cobbs@gmail.com⟩
| November 30, 2010 | Linux 6.4.0-150700.53.34-default |