table of contents
- NAME
- SYNOPSIS
- DESCRIPTION
- EXAMPLES
- FILE FORMATS
- HELP OPTIONS
- VERB LIST
- FUNCTION LIST
- COMMENTS-IN-DATA FLAGS
- COMPRESSED-DATA FLAGS
- CSV/TSV-ONLY FLAGS
- FILE-FORMAT FLAGS
- FLATTEN-UNFLATTEN FLAGS
- FORMAT-CONVERSION KEYSTROKE-SAVER FLAGS
- JSON-ONLY FLAGS
- LEGACY FLAGS
- MISCELLANEOUS FLAGS
- OUTPUT-COLORIZATION FLAGS
- PPRINT-ONLY FLAGS
- PROFILING FLAGS
- SEPARATOR FLAGS
- AUXILIARY COMMANDS
- MLRRC
- REPL
- VERBS
- FUNCTIONS FOR FILTER/PUT
- KEYWORDS FOR PUT AND FILTER
- AUTHOR
- SEE ALSO
MILLER(1) | MILLER(1) |
NAME¶
Miller -- like awk, sed, cut, join, and sort for name-indexed data such as CSV and tabular JSON.
SYNOPSIS¶
Usage: mlr [flags] {verb} [verb-dependent options ...] {zero or more file names}
If zero file names are provided, standard input is read, e.g.
mlr --csv sort -f shape example.csv
Output of one verb may be chained as input to another using
"then", e.g.
mlr --csv stats1 -a min,mean,max -f quantity then sort -f color
example.csv
Please see 'mlr help topics' for more information. Please also see https://miller.readthedocs.io
DESCRIPTION¶
Miller operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as a special case.) This manpage documents mlr 6.13.0-dev.
EXAMPLES¶
mlr --icsv --opprint cat example.csv mlr --icsv --opprint sort -f shape example.csv mlr --icsv --opprint sort -f shape -nr index example.csv mlr --icsv --opprint cut -f flag,shape example.csv mlr --csv filter '$color == "red"' example.csv mlr --icsv --ojson put '$ratio = $quantity / $rate' example.csv mlr --icsv --opprint --from example.csv sort -nr index then cut -f shape,quantity
FILE FORMATS¶
CSV/CSV-lite: comma-separated values with separate header line TSV: same but with tabs in places of commas +---------------------+ | apple,bat,cog | | 1,2,3 | Record 1: "apple":"1", "bat":"2", "cog":"3" | 4,5,6 | Record 2: "apple":"4", "bat":"5", "cog":"6" +---------------------+ JSON (array of objects): +---------------------+ | [ | | { | | "apple": 1, | Record 1: "apple":"1", "bat":"2", "cog":"3" | "bat": 2, | | "cog": 3 | | }, | | { | | "dish": { | Record 2: "dish.egg":"7", | "egg": 7, | "dish.flint":"8", "garlic":"" | "flint": 8 | | }, | | "garlic": "" | | } | | ] | +---------------------+ JSON Lines (sequence of one-line objects): +------------------------------------------------+ | {"apple": 1, "bat": 2, "cog": 3} | | {"dish": {"egg": 7, "flint": 8}, "garlic": ""} | +------------------------------------------------+
Record 1: "apple":"1", "bat":"2", "cog":"3"
Record 2: "dish:egg":"7", "dish:flint":"8", "garlic":"" PPRINT: pretty-printed tabular +---------------------+ | apple bat cog | | 1 2 3 | Record 1: "apple:"1", "bat":"2", "cog":"3" | 4 5 6 | Record 2: "apple":"4", "bat":"5", "cog":"6" +---------------------+ Markdown tabular: +-----------------------+ | | apple | bat | cog | | | | --- | --- | --- | | | | 1 | 2 | 3 | | Record 1: "apple:"1", "bat":"2", "cog":"3" | | 4 | 5 | 6 | | Record 2: "apple":"4", "bat":"5", "cog":"6" +-----------------------+ XTAB: pretty-printed transposed tabular +---------------------+ | apple 1 | Record 1: "apple":"1", "bat":"2", "cog":"3" | bat 2 | | cog 3 | | | | dish 7 | Record 2: "dish":"7", "egg":"8" | egg 8 | +---------------------+ DKVP: delimited key-value pairs (Miller default format) +---------------------+ | apple=1,bat=2,cog=3 | Record 1: "apple":"1", "bat":"2", "cog":"3" | dish=7,egg=8,flint | Record 2: "dish":"7", "egg":"8", "3":"flint" +---------------------+ NIDX: implicitly numerically indexed (Unix-toolkit style) +---------------------+ | the quick brown | Record 1: "1":"the", "2":"quick", "3":"brown" | fox jumped | Record 2: "1":"fox", "2":"jumped" +---------------------+
HELP OPTIONS¶
Type 'mlr help {topic}' for any of the following: Essentials:
mlr help topics
mlr help basic-examples
mlr help file-formats Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
mlr help compressed-data-flags
mlr help csv/tsv-only-flags
mlr help file-format-flags
mlr help flatten-unflatten-flags
mlr help format-conversion-keystroke-saver-flags
mlr help json-only-flags
mlr help legacy-flags
mlr help miscellaneous-flags
mlr help output-colorization-flags
mlr help pprint-only-flags
mlr help profiling-flags
mlr help separator-flags Verbs:
mlr help list-verbs
mlr help usage-verbs
mlr help verb Functions:
mlr help list-functions
mlr help list-function-classes
mlr help list-functions-in-class
mlr help usage-functions
mlr help usage-functions-by-class
mlr help function Keywords:
mlr help list-keywords
mlr help usage-keywords
mlr help keyword Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
mlr help type-arithmetic-info-extended Shorthands:
mlr -g = mlr help flags
mlr -l = mlr help list-verbs
mlr -L = mlr help usage-verbs
mlr -f = mlr help list-functions
mlr -F = mlr help usage-functions
mlr -k = mlr help list-keywords
mlr -K = mlr help usage-keywords Lastly, 'mlr help ...' will search for your exact text '...' using the sources of ’mlr help flag', 'mlr help verb', 'mlr help function', and 'mlr help keyword'. Use 'mlr help find ...' for approximate (substring) matches, e.g. 'mlr help find map' for all things with "map" in their names.
VERB LIST¶
altkv bar bootstrap case cat check clean-whitespace count-distinct count count-similar cut decimate fill-down fill-empty filter flatten format-values fraction gap grep group-by group-like gsub having-fields head histogram json-parse json-stringify join label latin1-to-utf8 least-frequent merge-fields most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records sparsify split ssub stats1 stats2 step sub summary tac tail tee template top utf8-to-latin1 unflatten uniq unspace unsparsify
FUNCTION LIST¶
abs acos acosh antimode any append apply arrayify asin asinh asserting_absent asserting_array asserting_bool asserting_boolean asserting_empty asserting_empty_map asserting_error asserting_float asserting_int asserting_map asserting_nonempty_map asserting_not_array asserting_not_empty asserting_not_map asserting_not_null asserting_null asserting_numeric asserting_present asserting_string atan atan2 atanh bitcount boolean capitalize cbrt ceil clean_whitespace collapse_whitespace concat contains cos cosh count depth dhms2fsec dhms2sec distinct_count erf erfc every exec exp expm1 flatten float floor fmtifnum fmtnum fold format fsec2dhms fsec2hms get_keys get_values gmt2localtime gmt2nsec gmt2sec gssub gsub haskey hexfmt hms2fsec hms2sec hostname index int invqnorm is_absent is_array is_bool is_boolean is_empty is_empty_map is_error is_float is_int is_map is_nan is_nonempty_map is_not_array is_not_empty is_not_map is_not_null is_null is_numeric is_present is_string joink joinkv joinv json_parse json_stringify kurtosis latin1_to_utf8 leafcount leftpad length localtime2gmt localtime2nsec localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect mapsum max maxlen md5 mean meaneb median mexp min minlen mmul mode msub nsec2gmt nsec2gmtdate nsec2localdate nsec2localtime null_count os percentile percentiles pow qnorm reduce regextract regextract_or_else rightpad round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime select sgn sha1 sha256 sha512 sin sinh skewness sort sort_collection splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub stat stddev strfntime strfntime_local strftime strftime_local string strip strlen strmatch strmatchx strpntime strpntime_local strptime strptime_local sub substr substr0 substr1 sum sum2 sum3 sum4 sysntime system systime systimeint tan tanh tolower toupper truncate typeof unflatten unformat unformatx upntime uptime urand urand32 urandelement urandint urandrange utf8_to_latin1 variance version ! != !=~ % & && * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~
COMMENTS-IN-DATA FLAGS¶
Miller lets you put comments in your data, such as
# This is a comment for a CSV file
a,b,c
1,2,3
4,5,6 Notes: * Comments are only honored at the start of a line. * In the absence of any of the below four options, comments are data like
any other text. (The comments-in-data feature is opt-in.) * When `--pass-comments` is used, comment lines are written to standard output
immediately upon being read; they are not part of the record stream. Results
may be counterintuitive. A suggestion is to place comments at the start of
data files. --pass-comments Immediately print commented lines (prefixed by `#`)
within the input. --pass-comments-with {string}
Immediately print commented lines within input, with
specified prefix. --skip-comments Ignore commented lines (prefixed by `#`) within the
input. --skip-comments-with {string}
Ignore commented lines within input, with specified
prefix.
COMPRESSED-DATA FLAGS¶
Miller offers a few different ways to handle reading data files which have been compressed. * Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin` * Decompression done outside the Miller process: `--prepipe` `--prepipex` Using `--prepipe` and `--prepipex` you can specify an action to be taken on each input file. The prepipe command must be able to read from standard input; it will be invoked with `{command} < {filename}`. The prepipex command must take a filename as argument; it will be invoked with `{command} {filename}`. Examples:
mlr --prepipe gunzip
mlr --prepipe zcat -cf
mlr --prepipe xz -cd
mlr --prepipe cat Note that this feature is quite general and is not limited to decompression utilities. You can use it to apply per-file filters of your choice. For output compression (or other) utilities, simply pipe the output: `mlr ... | {your compression command} > outputfilenamegoeshere` Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any decisions that might have been made based on the file suffix. Likewise, `--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified. --bz2in Uncompress bzip2 within the Miller process. Done by
default if file ends in `.bz2`. --gzin Uncompress gzip within the Miller process. Done by
default if file ends in `.gz`. --prepipe {decompression command}
You can, of course, already do without this for
single input files, e.g. `gunzip < myfile.csv.gz |
mlr ...`. Allowed at the command line, but not in
`.mlrrc` to avoid unexpected code execution. --prepipe-bz2 Same as `--prepipe bz2`, except this is allowed in
`.mlrrc`. --prepipe-gunzip Same as `--prepipe gunzip`, except this is allowed in
`.mlrrc`. --prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
`.mlrrc`. --prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
in `.mlrrc`. --prepipex {decompression command}
Like `--prepipe` with one exception: doesn't insert
`<` between command and filename at runtime. Useful
for some commands like `unzip -qc` which don't read
standard input. Allowed at the command line, but not
in `.mlrrc` to avoid unexpected code execution. --zin Uncompress zlib within the Miller process. Done by
default if file ends in `.z`. --zstdin Uncompress zstd within the Miller process. Done by
default if file ends in `.zstd`.
CSV/TSV-ONLY FLAGS¶
These are flags which are applicable to CSV format. --allow-ragged-csv-input or --ragged or --allow-ragged-tsv-input
If a data line has fewer fields than the header line,
fill remaining keys with empty string. If a data line
has more fields than the header line, use integer
field labels as in the implicit-header case. --csv-trim-leading-space Trims leading spaces in CSV data. Use this for data
like '"foo", "bar' which is non-RFC-4180 compliant,
but common. --headerless-csv-output or --ho or --headerless-tsv-output
Print only CSV/TSV data lines; do not print CSV/TSV
header lines. --implicit-csv-header or --headerless-csv-input or --hi or --implicit-tsv-header
Use 1,2,3,... as field labels, rather than from line
1 of input files. Tip: combine with `label` to
recreate missing headers. --lazy-quotes Accepts quotes appearing in unquoted fields, and
non-doubled quotes appearing in quoted fields. --no-auto-unsparsify For CSV/TSV output: if the record keys change from
one row to another, emit a blank line and a new
header line. This is non-compliant with RFC 4180 but
it helpful for heterogeneous data. --no-implicit-csv-header or --no-implicit-tsv-header
Opposite of `--implicit-csv-header`. This is the
default anyway -- the main use is for the flags to
`mlr join` if you have main file(s) which are
headerless but you want to join in on a file which
does have a CSV/TSV header. Then you could use `mlr
--csv --implicit-csv-header join
--no-implicit-csv-header -l
your-join-in-with-header.csv ...
your-headerless.csv`. --quote-all Force double-quoting of CSV fields. -N Keystroke-saver for `--implicit-csv-header
--headerless-csv-output`.
FILE-FORMAT FLAGS¶
See the File formats doc page, and or `mlr help file-formats`, for more about file formats Miller supports. Examples: `--csv` for CSV-formatted input and output; `--icsv --opprint` for CSV-formatted input and pretty-printed output. Please use `--iformat1 --oformat2` rather than `--format1 --oformat2`. The latter sets up input and output flags for `format1`, not all of which are overridden in all cases by setting output format to `format2`. --asv or --asvlite Use ASV format for input and output data. --csv or -c Use CSV format for input and output data. --csvlite Use CSV-lite format for input and output data. --dkvp Use DKVP format for input and output data. --gen-field-name Specify field name for --igen. Defaults to "i". --gen-start Specify start value for --igen. Defaults to 1. --gen-step Specify step value for --igen. Defaults to 1. --gen-stop Specify stop value for --igen. Defaults to 100. --iasv or --iasvlite Use ASV format for input data. --icsv Use CSV format for input data. --icsvlite Use CSV-lite format for input data. --idkvp Use DKVP format for input data. --igen Ignore input files and instead generate sequential
numeric input using --gen-field-name, --gen-start,
--gen-step, and --gen-stop values. See also the
seqgen verb, which is more useful/intuitive. --ijson Use JSON format for input data. --ijsonl Use JSON Lines format for input data. --imd or --imarkdown Use markdown-tabular format for input data. --inidx Use NIDX format for input data. --io {format name} Use format name for input and output data. For
example: `--io csv` is the same as `--csv`. --ipprint Use PPRINT format for input data. --itsv Use TSV format for input data. --itsvlite Use TSV-lite format for input data. --iusv or --iusvlite Use USV format for input data. --ixtab Use XTAB format for input data. --json or -j Use JSON format for input and output data. --jsonl Use JSON Lines format for input and output data. --nidx Use NIDX format for input and output data. --oasv or --oasvlite Use ASV format for output data. --ocsv Use CSV format for output data. --ocsvlite Use CSV-lite format for output data. --odkvp Use DKVP format for output data. --ojson Use JSON format for output data. --ojsonl Use JSON Lines format for output data. --omd or --omarkdown Use markdown-tabular format for output data. --onidx Use NIDX format for output data. --opprint Use PPRINT format for output data. --otsv Use TSV format for output data. --otsvlite Use TSV-lite format for output data. --ousv or --ousvlite Use USV format for output data. --oxtab Use XTAB format for output data. --pprint Use PPRINT format for input and output data. --tsv or -t Use TSV format for input and output data. --tsvlite Use TSV-lite format for input and output data. --usv or --usvlite Use USV format for input and output data. --xtab Use XTAB format for input and output data. --xvright Right-justify values for XTAB format. -i {format name} Use format name for input data. For example: `-i csv`
is the same as `--icsv`. -o {format name} Use format name for output data. For example: `-o
csv` is the same as `--ocsv`.
FLATTEN-UNFLATTEN FLAGS¶
These flags control how Miller converts record values which are maps or arrays, when input is JSON and output is non-JSON (flattening) or input is non-JSON and output is JSON (unflattening). See the Flatten/unflatten doc page for more information. --flatsep or --jflatsep {string}
Separator for flattening multi-level JSON keys, e.g.
`{"a":{"b":3}}` becomes `a:b => 3` for non-JSON
formats. Defaults to `.`. --no-auto-flatten When output is non-JSON, suppress the default
auto-flatten behavior. Default: if `$y = [7,8,9]`
then this flattens to `y.1=7,y.2=8,y.3=9, and
similarly for maps. With `--no-auto-flatten`, instead
we get `$y=[1, 2, 3]`. --no-auto-unflatten When input non-JSON and output is JSON, suppress the
default auto-unflatten behavior. Default: if the
input has `y.1=7,y.2=8,y.3=9` then this unflattens to
`$y=[7,8,9]`. flattens to `y.1=7,y.2=8,y.3=9. With
`--no-auto-flatten`, instead we get
`${y.1}=7,${y.2}=8,${y.3}=9`.
FORMAT-CONVERSION KEYSTROKE-SAVER FLAGS¶
As keystroke-savers for format-conversion you may use the following. The letters c, t, j, l, d, n, x, p, and m refer to formats CSV, TSV, DKVP, NIDX, JSON, JSON Lines, XTAB, PPRINT, and markdown, respectively. | In\out | CSV | TSV | JSON | JSONL | DKVP | NIDX | XTAB | PPRINT | Markdown | +----------+-------+-------+--------+--------+--------+--------+--------+--------+----------| | CSV | | --c2t | --c2j | --c2l | --c2d | --c2n | --c2x | --c2p | --c2m | | TSV | --t2c | | --t2j | --t2l | --t2d | --t2n | --t2x | --t2p | --t2m | | JSON | --j2c | --j2t | | --j2l | --j2d | --j2n | --j2x | --j2p | --j2m | | JSONL | --l2c | --l2t | | | --l2d | --l2n | --l2x | --l2p | --l2m | | DKVP | --d2c | --d2t | --d2j | --d2l | | --d2n | --d2x | --d2p | --d2m | | NIDX | --n2c | --n2t | --n2j | --n2l | --n2d | | --n2x | --n2p | --n2m | | XTAB | --x2c | --x2t | --x2j | --x2l | --x2d | --x2n | | --x2p | --x2m | | PPRINT | --p2c | --p2t | --p2j | --p2l | --p2d | --p2n | --p2x | | --p2m | | Markdown | --m2c | --m2t | --m2j | --m2l | --m2d | --m2n | --m2x | --m2p | | -p Keystroke-saver for `--nidx --fs space --repifs`. -T Keystroke-saver for `--nidx --fs tab`.
JSON-ONLY FLAGS¶
These are flags which are applicable to JSON output format. --jlistwrap or --jl Wrap JSON output in outermost `[ ]`. This is the
default for JSON output format. --jvquoteall Force all JSON values -- recursively into lists and
object -- to string. --jvstack Put one key-value pair per line for JSON output
(multi-line output). This is the default for JSON
output format. --no-jlistwrap Do not wrap JSON output in outermost `[ ]`. This is
the default for JSON Lines output format. --no-jvstack Put objects/arrays all on one line for JSON output.
This is the default for JSON Lines output format.
LEGACY FLAGS¶
These are flags which don't do anything in the current Miller version. They are accepted as no-op flags in order to keep old scripts from breaking. --jknquoteint Type information from JSON input files is now
preserved throughout the processing stream. --jquoteall Type information from JSON input files is now
preserved throughout the processing stream. --json-fatal-arrays-on-input
Miller now supports arrays as of version 6. --json-map-arrays-on-input
Miller now supports arrays as of version 6. --json-skip-arrays-on-input
Miller now supports arrays as of version 6. --jsonx The `--jvstack` flag is now default true in Miller 6. --mmap Miller no longer uses memory-mapping to access data
files. --no-mmap Miller no longer uses memory-mapping to access data
files. --ojsonx The `--jvstack` flag is now default true in Miller 6. --quote-minimal Ignored as of version 6. Types are inferred/retained
through the processing flow now. --quote-none Ignored as of version 6. Types are inferred/retained
through the processing flow now. --quote-numeric Ignored as of version 6. Types are inferred/retained
through the processing flow now. --quote-original Ignored as of version 6. Types are inferred/retained
through the processing flow now. --vflatsep Ignored as of version 6. This functionality is
subsumed into JSON formatting.
MISCELLANEOUS FLAGS¶
These are flags which don't fit into any other category. --fflush Force buffered output to be written after every
output record. The default is flush output after
every record if the output is to the terminal, or
less often if the output is to a file or a pipe. The
default is a significant performance optimization for
large files. Use this flag to force frequent updates
even when output is to a pipe or file, at a
performance cost. --files {filename} Use this to specify a file which itself contains, one
per line, names of input files. May be used more than
once. --from {filename} Use this to specify an input file before the verb(s),
rather than after. May be used more than once.
Example: `mlr --from a.dat --from b.dat cat` is the
same as `mlr cat a.dat b.dat`. --hash-records This is an internal parameter which normally does not
need to be modified. It controls the mechanism by
which Miller accesses fields within records. In
general --no-hash-records is faster, and is the
default. For specific use-cases involving data having
many fields, and many of them being processed during
a given processing run, --hash-records might offer a
slight performance benefit. --infer-int-as-float or -A
Cast all integers in data files to floats. --infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings. --infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
This is just like `put -f` and `filter -f` except
it's up-front on the command line, so you can do
something like `alias mlr='mlr --load ~/myscripts'`
if you like. --mfrom {filenames} Use this to specify one of more input files before
the verb(s), rather than after. May be used more than
once. The list of filename must end with `--`. This
is useful for example since `--from *.csv` doesn't do
what you might hope but `--mfrom *.csv --` does. --mload {filenames} Like `--load` but works with more than one filename,
e.g. `--mload *.mlr --`. --no-dedupe-field-names By default, if an input record has a field named `x`
and another also named `x`, the second will be
renamed `x_2`, and so on. With this flag provided,
the second `x`'s value will replace the first `x`'s
value when the record is read. This flag has no
effect on JSON input records, where duplicate keys
always result in the last one's value being retained. --no-fflush Let buffered output not be written after every output
record. The default is flush output after every
record if the output is to the terminal, or less
often if the output is to a file or a pipe. The
default is a significant performance optimization for
large files. Use this flag to allow less-frequent
updates when output is to the terminal. This is
unlikely to be a noticeable performance improvement,
since direct-to-screen output for large files has its
own overhead. --no-hash-records See --hash-records. --norc Do not load a .mlrrc file. --nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records. --ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
sprintf-style codes (https://pkg.go.dev/fmt) for
floating-point numbers. If not specified, default
formatting is used. See also the `fmtnum` function
and the `format-values` verb. --ofmte {n} Use --ofmte 6 as shorthand for --ofmt %.6e, etc. --ofmtf {n} Use --ofmtf 6 as shorthand for --ofmt %.6f, etc. --ofmtg {n} Use --ofmtg 6 as shorthand for --ofmt %.6g, etc. --records-per-batch {n} This is an internal parameter for maximum number of
records in a batch size. Normally this does not need
to be modified, except when input is from `tail -f`.
See also
https://miller.readthedocs.io/en/latest/reference-main-flag-list/. --s-no-comment-strip {file name}
Take command-line flags from file name, like -s, but
with no comment-stripping. For more information
please see
https://miller.readthedocs.io/en/latest/scripting/. --seed {n} with `n` of the form `12345678` or `0xcafefeed`. For
`put`/`filter` `urand`, `urandint`, and `urand32`. --tz {timezone} Specify timezone, overriding `$TZ` environment
variable (if any). -I Process files in-place. For each file name on the
command line, output is written to a temp file in the
same directory, which is then renamed over the
original. Each file is processed in isolation: if the
output format is CSV, CSV headers will be present in
each output file, statistics are only over each
file's own records; and so on. -n Process no input files, nor standard input either.
Useful for `mlr put` with `begin`/`end` statements
only. (Same as `--from /dev/null`.) Also useful in
`mlr -n put -v '...'` for analyzing abstract syntax
trees (if that's your thing). -s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/. -x If any record has an error value in it, report it and
stop the process. The default is to print the field
value as `(error)` and continue.
OUTPUT-COLORIZATION FLAGS¶
Miller uses colors to highlight outputs. You can specify color preferences. Note: output colorization does not work on Windows. Things having colors: * Keys in CSV header lines, JSON keys, etc * Values in CSV data lines, JSON scalar values, etc in regression-test output * Some online-help strings Rules for coloring: * By default, colorize output only if writing to stdout and stdout is a TTY.
* Example: color: `mlr --csv cat foo.csv`
* Example: no color: `mlr --csv cat foo.csv > bar.csv`
* Example: no color: `mlr --csv cat foo.csv | less` * The default colors were chosen since they look OK with white or black
terminal background, and are differentiable with common varieties of human
color vision. Mechanisms for coloring: * Miller uses ANSI escape sequences only. This does not work on Windows
except within Cygwin. * Requires `TERM` environment variable to be set to non-empty string. * Doesn't try to check to see whether the terminal is capable of 256-color
ANSI vs 16-color ANSI. Note that if colors are in the range 0..15
then 16-color ANSI escapes are used, so this is in the user's control. How you can control colorization: * Suppression/unsuppression:
* Environment variable `export MLR_NO_COLOR=true` or `export NO_COLOR=true` means don't color even if stdout+TTY.
* Environment variable `export MLR_ALWAYS_COLOR=true` means do color
even if not stdout+TTY.
For example, you might want to use this when piping mlr output to `less -r`.
* Command-line flags `--no-color` or `-M`, `--always-color` or `-C`. * Color choices can be specified by using environment variables, or command-line
flags, with values 0..255:
* `export MLR_KEY_COLOR=208`, `MLR_VALUE_COLOR=33`, etc.:
`MLR_KEY_COLOR` `MLR_VALUE_COLOR` `MLR_PASS_COLOR` `MLR_FAIL_COLOR`
`MLR_REPL_PS1_COLOR` `MLR_REPL_PS2_COLOR` `MLR_HELP_COLOR`
* Command-line flags `--key-color 208`, `--value-color 33`, etc.:
`--key-color` `--value-color` `--pass-color` `--fail-color`
`--repl-ps1-color` `--repl-ps2-color` `--help-color`
* This is particularly useful if your terminal's background color clashes
with current settings. If environment-variable settings and command-line flags are both provided, the latter take precedence. Colors can be specified using names such as "red" or "orchid": please see `mlr --list-color-names` to see available names. They can also be specified using numbers in the range 0..255, like 170: please see `mlr --list-color-codes`. You can also use "bold", "underline", and/or "reverse". Additionally, combinations of those can be joined with a "-", like "red-bold", "bold-170", "bold-underline", etc. --always-color or -C Instructs Miller to colorize output even when it
normally would not. Useful for piping output to `less
-r`. --fail-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for failing cases in `mlr
regtest`. --help-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for highlights in `mlr help`
output. --key-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for record keys. --list-color-codes Show the available color codes in the range 0..255,
such as 170 for example. --list-color-names Show the names for the available color codes, such as
`orchid` for example. --no-color or -M Instructs Miller to not colorize any output. --pass-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for passing cases in `mlr
regtest`. --value-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for record values.
PPRINT-ONLY FLAGS¶
These are flags which are applicable to PPRINT format. --barred or --barred-output
Prints a border around PPRINT output. --barred-input When used in conjunction with --pprint, accepts
barred input. --right Right-justifies all fields for PPRINT output.
PROFILING FLAGS¶
These are flags for profiling Miller performance. --cpuprofile {CPU-profile file name}
Create a CPU-profile file for performance analysis.
Instructions will be printed to stderr. This flag
must be the very first thing after 'mlr' on the
command line. --time Print elapsed execution time in seconds to stderr at
the end of the execution of the program. --traceprofile Create a trace-profile file for performance analysis.
Instructions will be printed to stderr. This flag
must be the very first thing after 'mlr' on the
command line.
SEPARATOR FLAGS¶
See the Separators doc page for more about record separators, field separators, and pair separators. Also see the File formats doc page, or `mlr help file-formats`, for more about the file formats Miller supports. In brief: * For DKVP records like `x=1,y=2,z=3`, the fields are separated by a comma,
the key-value pairs are separated by a comma, and each record is separated
from the next by a newline. * Each file format has its own default separators. * Most formats, such as CSV, don't support pair-separators: keys are on the CSV
header line and values are on each CSV data line; keys and values are not
placed next to one another. * Some separators are not programmable: for example JSON uses a colon as a
pair separator but this is non-modifiable in the JSON spec. * You can set separators differently between Miller's input and output --
hence `--ifs` and `--ofs`, etc. Notes about line endings: * Default line endings (`--irs` and `--ors`) are newline
which is interpreted to accept carriage-return/newline files (e.g. on Windows)
for input, and to produce platform-appropriate line endings on output. Notes about all other separators: * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
do key-value pairs appear juxtaposed. * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
a blank line. Everything above about `--irs/--ors/--rs auto` becomes `--ifs/--ofs/--fs`
auto for XTAB format. (XTAB's default IFS/OFS are "auto".) * OFS must be single-character for PPRINT format. This is because it is used
with repetition for alignment; multi-character separators would make
alignment impossible. * OPS may be multi-character for XTAB format, in which case alignment is
disabled. * FS/PS are ignored for markdown format; RS is used. * All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format. * You can specify separators in any of the following ways, shown by example:
- Type them out, quoting as necessary for shell escapes, e.g.
`--fs '|' --ips :`
- C-style escape sequences, e.g. `--rs '\r\n' --fs '\t'`.
- To avoid backslashing, you can use any of the following names:
ascii_esc = "\x1b"
ascii_etx = "\x03"
ascii_fs = "\x1c"
ascii_gs = "\x1d"
ascii_null = "\x00"
ascii_rs = "\x1e"
ascii_soh = "\x01"
ascii_stx = "\x02"
ascii_us = "\x1f"
asv_fs = "\x1f"
asv_rs = "\x1e"
colon = ":"
comma = ","
cr = "\r"
crcr = "\r\r"
crlf = "\r\n"
crlfcrlf = "\r\n\r\n"
equals = "="
lf = "\n"
lflf = "\n\n"
newline = "\n"
pipe = "|"
semicolon = ";"
slash = "/"
space = " "
tab = "\t"
usv_fs = "\xe2\x90\x9f"
usv_rs = "\xe2\x90\x9e"
- Similarly, you can use the following for `--ifs-regex` and `--ips-regex`:
spaces = "( )+"
tabs = "(\t)+"
whitespace = "([ \t])+" * Default separators by format:
Format FS PS RS
csv "," N/A "\n"
csvlite "," N/A "\n"
dkvp "," "=" "\n"
gen "," N/A "\n"
json N/A N/A N/A
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n" --fs {string} Specify FS for input and output. --ifs {string} Specify FS for input. --ifs-regex {string} Specify FS for input as a regular expression. --ips {string} Specify PS for input. --ips-regex {string} Specify PS for input as a regular expression. --irs {string} Specify RS for input. --ofs {string} Specify FS for output. --ops {string} Specify PS for output. --ors {string} Specify RS for output. --ps {string} Specify PS for input and output. --repifs Let IFS be repeated: e.g. for splitting on multiple
spaces. --rs {string} Specify RS for input and output.
AUXILIARY COMMANDS¶
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex For more information, please invoke mlr {subcommand} --help.
MLRRC¶
You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc. For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file and that will be the default input/output format unless otherwise specified on the command line. The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional. Hash-style comments and blank lines are ignored. Sample .mlrrc: # Input and output formats are CSV by default (unless otherwise specified # on the mlr command line): csv # These are no-ops for CSV, but when I do use JSON output, I want these # pretty-printing options to be used: jvstack jlistwrap How to specify location of .mlrrc: * If $MLRRC is set:
o If its value is "__none__" then no .mlrrc files are processed.
o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
errors, they abort mlr with a usage message (as if you had mistyped something on the
command line). If the file can't be loaded at all, though, it is silently skipped.
o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
set in the environment. * Otherwise:
o If $HOME/.mlrrc exists, it's then processed as above.
o If ./.mlrrc exists, it's then also processed as above.
(I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.) * The command-line flag "--norc" can be used to suppress loading the .mlrrc file even when other
conditions are met. See also: https://miller.readthedocs.io/en/latest/customization.html
REPL¶
Usage: mlr repl [options] {zero or more data-file names} -v Prints the expressions's AST (abstract syntax tree), which gives
full transparency on the precedence and associativity rules of
Miller's grammar, to stdout. -d Like -v but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -w Show warnings about uninitialized variables -q Don't show startup banner -s Don't show prompts --load {DSL script file} Load script file before presenting the prompt.
If the name following --load is a directory, load all "*.mlr" files
in that directory. --mload {DSL script files} -- Like --load but works with more than one filename,
e.g. '--mload *.mlr --'. -h|--help Show this message. Or any --icsv, --ojson, etc. reader/writer options as for the main Miller command line. Any data-file names are opened just as if you had waited and typed :open {filenames} at the Miller REPL prompt.
VERBS¶
altkv¶
Usage: mlr altkv [options] Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs. Options: -h|--help Show this message.
bar¶
Usage: mlr bar [options] Replaces a numeric field with a number of asterisks, allowing for cheesy bar plots. These align best with --opprint or --oxtab output format. Options: -f {a,b,c} Field names to convert to bars. --lo {lo} Lower-limit value for min-width bar: default '0.000000'. --hi {hi} Upper-limit value for max-width bar: default '100.000000'. -w {n} Bar-field width: default '40'. --auto Automatically computes limits, ignoring --lo and --hi.
Holds all records in memory before producing any output. -c {character} Fill character: default '*'. -x {character} Out-of-bounds character: default '#'. -b {character} Blank character: default '.'. Nominally the fill, out-of-bounds, and blank characters will be strings of length 1. However you can make them all longer if you so desire. -h|--help Show this message.
bootstrap¶
Usage: mlr bootstrap [options] Emits an n-sample, with replacement, of the input records. See also mlr sample and mlr shuffle. Options:
-n Number of samples to output. Defaults to number of input records.
Must be non-negative. -h|--help Show this message.
case¶
Usage: mlr case [options] Uppercases strings in record keys and/or values. Options: -k Case only keys, not keys and values. -v Case only values, not keys and values. -f {a,b,c} Specify which field names to case (default: all) -u Convert to uppercase -l Convert to lowercase -s Convert to sentence case (capitalize first letter) -t Convert to title case (capitalize words) -h|--help Show this message.
cat¶
Usage: mlr cat [options] Passes input records directly to output. Most useful for format conversion. Options: -n Prepend field "n" to each record with record-counter starting at 1. -N {name} Prepend field {name} to each record with record-counter starting at 1. -g {a,b,c} Optional group-by-field names for counters, e.g. a,b,c --filename Prepend current filename to each record. --filenum Prepend current filenum (1-up) to each record. -h|--help Show this message.
check¶
Usage: mlr check [options] Consumes records without printing any output, Useful for doing a well-formatted check on input data. with the exception that warnings are printed to stderr. Current checks are: * Data are parseable * If any key is the empty string Options: -h|--help Show this message.
clean-whitespace¶
Usage: mlr clean-whitespace [options] For each record, for each field in the record, whitespace-cleans the keys and/or values. Whitespace-cleaning entails stripping leading and trailing whitespace, and replacing multiple whitespace with singles. For finer-grained control, please see the DSL functions lstrip, rstrip, strip, collapse_whitespace, and clean_whitespace. Options: -k|--keys-only Do not touch values. -v|--values-only Do not touch keys. It is an error to specify -k as well as -v -- to clean keys and values, leave off -k as well as -v. -h|--help Show this message.
count-distinct¶
Usage: mlr count-distinct [options] Prints number of records having distinct values for specified field names. Same as uniq -c. Options: -f {a,b,c} Field names for distinct count. -x {a,b,c} Field names to exclude for distinct count: use each record's others instead. -n Show only the number of distinct values. Not compatible with -u. -o {name} Field name for output count. Default "count".
Ignored with -u. -u Do unlashed counts for multiple field names. With -f a,b and
without -u, computes counts for distinct combinations of a
and b field values. With -f a,b and with -u, computes counts
for distinct a field values and counts for distinct b field
values separately.
count¶
Usage: mlr count [options] Prints number of records, optionally grouped by distinct values for specified field names. Options: -g {a,b,c} Optional group-by-field names for counts, e.g. a,b,c -n {n} Show only the number of distinct values. Not interesting without -g. -o {name} Field name for output-count. Default "count". -h|--help Show this message.
count-similar¶
Usage: mlr count-similar [options] Ingests all records, then emits each record augmented by a count of the number of other records having the same group-by field values. Options: -g {a,b,c} Group-by-field names for counts, e.g. a,b,c -o {name} Field name for output-counts. Defaults to "count". -h|--help Show this message.
cut¶
Usage: mlr cut [options] Passes through input records with specified fields included/excluded. Options:
-f {a,b,c} Comma-separated field names for cut, e.g. a,b,c.
-o Retain fields in the order specified here in the argument list.
Default is to retain them in the order found in the input data.
-x|--complement Exclude, rather than include, field names specified by -f.
-r Treat field names as regular expressions. "ab", "a.*b" will
match any field name containing the substring "ab" or matching
"a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
be used. The -o flag is ignored when -r is present. -h|--help Show this message. Examples:
mlr cut -f hostname,status
mlr cut -x -f hostname,status
mlr cut -r -f '^status$,sda[0-9]'
mlr cut -r -f '^status$,"sda[0-9]"'
mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
decimate¶
Usage: mlr decimate [options] Passes through one of every n records, optionally by category. Options:
-b Decimate by printing first of every n.
-e Decimate by printing last of every n (default).
-g {a,b,c} Optional group-by-field names for decimate counts, e.g. a,b,c.
-n {n} Decimation factor (default 10). -h|--help Show this message.
fill-down¶
Usage: mlr fill-down [options] If a given record has a missing value for a given field, fill that from the corresponding value from a previous record, if any. By default, a 'missing' field either is absent, or has the empty-string value. With -a, a field is 'missing' only if it is absent. Options:
--all Operate on all fields in the input.
-a|--only-if-absent If a given record has a missing value for a given field,
fill that from the corresponding value from a previous record, if any.
By default, a 'missing' field either is absent, or has the empty-string value.
With -a, a field is 'missing' only if it is absent.
-f Field names for fill-down.
-h|--help Show this message.
fill-empty¶
Usage: mlr fill-empty [options] Fills empty-string fields with specified fill-value. Options: -v {string} Fill-value: defaults to "N/A" -S Don't infer type -- so '-v 0' would fill string 0 not int 0.
filter¶
Usage: mlr filter [options] {DSL expression} Lets you use a domain-specific language to programmatically filter which stream records will be output. See also: https://miller.readthedocs.io/en/latest/reference-verbs Options: -f {file name} File containing a DSL expression (see examples below). If the filename
is a directory, all *.mlr files in that directory are loaded. -e {expression} You can use this after -f to add an expression. Example use
case: define functions/subroutines in a file you specify with -f, then call
them with an expression you specify with -e. (If you mix -e and -f then the expressions are evaluated in the order encountered. Since the expression pieces are simply concatenated, please be sure to use intervening semicolons to separate expressions.) -s name=value: Predefines out-of-stream variable @name to have
Thus mlr put -s foo=97 '$column += @foo' is like
mlr put 'begin {@foo = 97} $column += @foo'.
The value part is subject to type-inferencing.
May be specified more than once, e.g. -s name1=value1 -s name2=value2.
Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE -x (default false) Prints records for which {expression} evaluates to false, not true,
i.e. invert the sense of the filter expression. -q Does not include the modified record in the output stream.
Useful for when all desired output is in begin and/or end blocks. -S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done
by the record-readers before filter/put is executed. Supported as no-op pass-through
flags for backward compatibility. -h|--help Show this message. Parser-info options: -w Print warnings about things like uninitialized variables. -W Same as -w, but exit the process if there are any warnings. -p Prints the expressions's AST (abstract syntax tree), which gives full
transparency on the precedence and associativity rules of Miller's grammar,
to stdout. -d Like -p but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -E Echo DSL expression before printing parse-tree -v Same as -E -p. -X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you
only want to look at parser information. Records will pass the filter depending on the last bare-boolean statement in the DSL expression. That can be the result of <, ==, >, etc., the return value of a function call which returns boolean, etc. Examples:
mlr --csv --from example.csv filter '$color == "red"'
mlr --csv --from example.csv filter '$color == "red" && flag == true' More example filter expressions:
First record in each file:
'FNR == 1'
Subsampling:
'urand() < 0.001'
Compound booleans:
'$color != "blue" && $value > 4.2'
'($x < 0.5 && $y < 0.5) || ($x > 0.5 && $y > 0.5)'
Regexes with case-insensitive flag
'($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
Assignments, then bare-boolean filter statement:
'$ab = $a+$b; $cd = $c+$d; $ab != $cd'
Bare-boolean filter statement within a conditional:
'if (NR < 100) {
$x > 0.3;
} else {
$x > 0.002;
}
'
Using 'any' higher-order function to see if $index is 10, 20, or 30:
'any([10,20,30], func(e) {return $index == e})' See also https://miller.readthedocs.io/reference-dsl for more context.
flatten¶
Usage: mlr flatten [options] Flattens multi-level maps to single-level ones. Example: field with name 'a' and value '{"b": { "c": 4 }}' becomes name 'a.b.c' and value 4. Options: -f Comma-separated list of field names to flatten (default all). -s Separator, defaulting to mlr --flatsep value. -h|--help Show this message.
format-values¶
Usage: mlr format-values [options] Applies format strings to all field values, depending on autodetected type. * If a field value is detected to be integer, applies integer format. * Else, if a field value is detected to be float, applies float format. * Else, applies string format. Note: this is a low-keystroke way to apply formatting to many fields. To get finer control, please see the fmtnum function within the mlr put DSL. Note: this verb lets you apply arbitrary format strings, which can produce undefined behavior and/or program crashes. See your system's "man printf". Options: -i {integer format} Defaults to "%d".
Examples: "%06lld", "%08llx".
Note that Miller integers are long long so you must use
formats which apply to long long, e.g. with ll in them.
Undefined behavior results otherwise. -f {float format} Defaults to "%f".
Examples: "%8.3lf", "%.6le".
Note that Miller floats are double-precision so you must
use formats which apply to double, e.g. with l[efg] in them.
Undefined behavior results otherwise. -s {string format} Defaults to "%s".
Examples: "_%s", "%08s".
Note that you must use formats which apply to string, e.g.
with s in them. Undefined behavior results otherwise. -n Coerce field values autodetected as int to float, and then
apply the float format.
fraction¶
Usage: mlr fraction [options] For each record's value in specified fields, computes the ratio of that value to the sum of values in that field over all input records. E.g. with input records x=1 x=2 x=3 and x=4, emits output records x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4 Note: this is internally a two-pass algorithm: on the first pass it retains input records and accumulates sums; on the second pass it computes quotients and emits output records. This means it produces no output until all input is read. Options: -f {a,b,c} Field name(s) for fraction calculation -g {d,e,f} Optional group-by-field name(s) for fraction counts -p Produce percents [0..100], not fractions [0..1]. Output field names
end with "_percent" rather than "_fraction" -c Produce cumulative distributions, i.e. running sums: each output
value folds in the sum of the previous for the specified group
E.g. with input records x=1 x=2 x=3 and x=4, emits output records
x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
gap¶
Usage: mlr gap [options] Emits an empty record every n records, or when certain values change. Options: Emits an empty record every n records, or when certain values change. -g {a,b,c} Print a gap whenever values of these fields (e.g. a,b,c) changes. -n {n} Print a gap every n records. One of -f or -g is required. -n is ignored if -g is present. -h|--help Show this message.
grep¶
Usage: mlr grep [options] {regular expression} Passes through records which match the regular expression. Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. -a Only grep for values, not keys and values. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. By contrast, "mlr grep" allows you to regex-match the entire record. It does this by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using OFS "," and OPS "=", and matching the resulting line against the regex specified here. In particular, the regex is not applied to the input stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the regex will be matched, not against either of these lines, but against the DKVP line "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, and this command is intended to be merely a keystroke-saver. To get all the features of system grep, you can do
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
group-by¶
Usage: mlr group-by [options] {comma-separated field names} Outputs records in batches having identical values at specified field names.Options: -h|--help Show this message.
group-like¶
Usage: mlr group-like [options] Outputs records in batches having identical field names. Options: -h|--help Show this message.
gsub¶
Usage: mlr gsub [options] Replaces old string with new string in specified field(s), with regex support for the old string and handling multiple matches, like the `gsub` DSL function. See also the `sub` and `ssub` verbs. Options: -f {a,b,c} Field names to convert. -h|--help Show this message.
having-fields¶
Usage: mlr having-fields [options] Conditionally passes through records depending on each record's field names. Options:
--at-least {comma-separated names}
--which-are {comma-separated names}
--at-most {comma-separated names}
--all-matching {regular expression}
--any-matching {regular expression}
--none-matching {regular expression} Examples:
mlr having-fields --which-are amount,status,owner
mlr having-fields --any-matching 'sda[0-9]'
mlr having-fields --any-matching '"sda[0-9]"'
mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
head¶
Usage: mlr head [options] Passes through the first n records, optionally by category. Without -g, ceases consuming more input (i.e. is fast) when n records have been read. Options: -g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c. -n {n} Head-count to print. Default 10. -h|--help Show this message.
histogram¶
Just a histogram. Input values < lo or > hi are not counted. Usage: mlr histogram [options] -f {a,b,c} Value-field names for histogram counts --lo {lo} Histogram low value --hi {hi} Histogram high value --nbins {n} Number of histogram bins. Defaults to 20. --auto Automatically computes limits, ignoring --lo and --hi.
Holds all values in memory before producing any output. -o {prefix} Prefix for output field name. Default: no prefix. -h|--help Show this message.
json-parse¶
Usage: mlr json-parse [options] Tries to convert string field values to parsed JSON, e.g. "[1,2,3]" -> [1,2,3]. Options: -f {...} Comma-separated list of field names to json-parse (default all). -k If supplied, then on parse fail for any cell, keep the (unparsable)
input value for the cell. -h|--help Show this message.
json-stringify¶
Usage: mlr json-stringify [options] Produces string field values from field-value data, e.g. [1,2,3] -> "[1,2,3]". Options: -f {...} Comma-separated list of field names to json-parse (default all). --jvstack Produce multi-line JSON output. --no-jvstack Produce single-line JSON output per record (default). -h|--help Show this message.
join¶
Usage: mlr join [options] Joins records from specified left file name with records from all file names at the end of the Miller argument list. Functionality is essentially the same as the system "join" command, but for record streams. Options:
-f {left file name}
-j {a,b,c} Comma-separated join-field names for output
-l {a,b,c} Comma-separated join-field names for left input file;
defaults to -j values if omitted.
-r {a,b,c} Comma-separated join-field names for right input file(s);
defaults to -j values if omitted.
--lk|--left-keep-field-names {a,b,c} If supplied, this means keep only the specified field
names from the left file. Automatically includes the join-field name(s). Helpful
for when you only want a limited subset of information from the left file.
Tip: you can use --lk "": this means the left file becomes solely a row-selector
for the input files.
--lp {text} Additional prefix for non-join output field names from
the left file
--rp {text} Additional prefix for non-join output field names from
the right file(s)
--np Do not emit paired records
--ul Emit unpaired records from the left file
--ur Emit unpaired records from the right file(s)
-s|--sorted-input Require sorted input: records must be sorted
lexically by their join-field names, else not all records will
be paired. The only likely use case for this is with a left
file which is too big to fit into system memory otherwise.
-u Enable unsorted input. (This is the default even without -u.)
In this case, the entire left file will be loaded into memory.
--prepipe {command} As in main input options; see mlr --help for details.
If you wish to use a prepipe command for the main input as well
as here, it must be specified there as well as here.
--prepipex {command} Likewise. File-format options default to those for the right file names on the Miller argument list, but may be overridden for the left file as follows. Please see the main "mlr --help" for more information on syntax for these arguments:
-i {one of csv,dkvp,nidx,pprint,xtab}
--irs {record-separator character}
--ifs {field-separator character}
--ips {pair-separator character}
--repifs
--implicit-csv-header
--implicit-tsv-header
--no-implicit-csv-header
--no-implicit-tsv-header For example, if you have 'mlr --csv ... join -l foo ... ' then the left-file format will be specified CSV as well unless you override with 'mlr --csv ... join --ijson -l foo' etc. Likewise, if you have 'mlr --csv --implicit-csv-header ...' then the join-in file will be expected to be headerless as well unless you put '--no-implicit-csv-header' after 'join'. Please use "mlr --usage-separator-options" for information on specifying separators. Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#join for more information including examples.
label¶
Usage: mlr label [options] {new1,new2,new3,...} Given n comma-separated names, renames the first n fields of each record to have the respective name. (Fields past the nth are left with their original names.) Particularly useful with --inidx or --implicit-csv-header, to give useful names to otherwise integer-indexed fields. Options: -h|--help Show this message.
latin1-to-utf8¶
Usage: mlr latin1-to-utf8, with no options. Recursively converts record strings from Latin-1 to UTF-8. For field-level control, please see the latin1_to_utf8 DSL function. Options: -h|--help Show this message.
least-frequent¶
Usage: mlr least-frequent [options] Shows the least frequently occurring distinct values for specified field names. The first entry is the statistical anti-mode; the remaining are runners-up. Options: -f {one or more comma-separated field names}. Required flag. -n {count}. Optional flag defaulting to 10. -b Suppress counts; show only field values. -o {name} Field name for output count. Default "count". See also "mlr most-frequent".
merge-fields¶
Usage: mlr merge-fields [options] Computes univariate statistics for each input record, accumulated across specified fields. Options: -a {sum,count,...} Names of accumulators. One or more of:
count Count instances of fields
null_count Count number of empty-string/JSON-null instances per field
distinct_count Count number of distinct values per field
mode Find most-frequently-occurring values for fields; first-found wins tie
antimode Find least-frequently-occurring values for fields; first-found wins tie
sum Compute sums of specified fields
mean Compute averages (sample means) of specified fields
mad Compute mean absolute deviation
var Compute sample variance of specified fields
stddev Compute sample standard deviation of specified fields
meaneb Estimate error bars for averages (assuming no sample autocorrelation)
skewness Compute sample skewness of specified fields
kurtosis Compute sample kurtosis of specified fields
min Compute minimum values of specified fields
max Compute maximum values of specified fields
minlen Compute minimum string-lengths of specified fields
maxlen Compute maximum string-lengths of specified fields -f {a,b,c} Value-field names on which to compute statistics. Requires -o. -r {a,b,c} Regular expressions for value-field names on which to compute
statistics. Requires -o. -c {a,b,c} Substrings for collapse mode. All fields which have the same names
after removing substrings will be accumulated together. Please see
examples below. -i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields. -o {name} Output field basename for -f/-r. -k Keep the input fields which contributed to the output statistics;
the default is to omit them. String-valued data make sense unless arithmetic on them is required, e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data, numbers are less than strings. Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8". Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
summed over. Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
produces "bar_sum=15,bar_count=4" since all four fields are summed over. Example: mlr merge-fields -a sum,count -c in_,out_
produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
"b_y", and "b_out_x" collapses to "b_x".
most-frequent¶
Usage: mlr most-frequent [options] Shows the most frequently occurring distinct values for specified field names. The first entry is the statistical mode; the remaining are runners-up. Options: -f {one or more comma-separated field names}. Required flag. -n {count}. Optional flag defaulting to 10. -b Suppress counts; show only field values. -o {name} Field name for output count. Default "count". See also "mlr least-frequent".
nest¶
Usage: mlr nest [options] Explodes specified field values into separate fields/records, or reverses this. Options:
--explode,--implode One is required.
--values,--pairs One is required.
--across-records,--across-fields One is required.
-f {field name} Required.
--nested-fs {string} Defaults to ";". Field separator for nested values.
--nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
--evar {string} Shorthand for --explode --values --across-records --nested-fs {string}
--ivar {string} Shorthand for --implode --values --across-records --nested-fs {string} Please use "mlr --usage-separator-options" for information on specifying separators. Examples:
mlr nest --explode --values --across-records -f x
with input record "x=a;b;c,y=d" produces output records
"x=a,y=d"
"x=b,y=d"
"x=c,y=d"
Use --implode to do the reverse.
mlr nest --explode --values --across-fields -f x
with input record "x=a;b;c,y=d" produces output records
"x_1=a,x_2=b,x_3=c,y=d"
Use --implode to do the reverse.
mlr nest --explode --pairs --across-records -f x
with input record "x=a:1;b:2;c:3,y=d" produces output records
"a=1,y=d"
"b=2,y=d"
"c=3,y=d"
mlr nest --explode --pairs --across-fields -f x
with input record "x=a:1;b:2;c:3,y=d" produces output records
"a=1,b=2,c=3,y=d" Notes: * With --pairs, --implode doesn't make sense since the original field name has
been lost. * The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma. See also mlr reshape.
nothing¶
Usage: mlr nothing [options] Drops all input records. Useful for testing, or after tee/print/etc. have produced other output. Options: -h|--help Show this message.
put¶
Usage: mlr put [options] {DSL expression} Lets you use a domain-specific language to programmatically alter stream records. See also: https://miller.readthedocs.io/en/latest/reference-verbs Options: -f {file name} File containing a DSL expression (see examples below). If the filename
is a directory, all *.mlr files in that directory are loaded. -e {expression} You can use this after -f to add an expression. Example use
case: define functions/subroutines in a file you specify with -f, then call
them with an expression you specify with -e. (If you mix -e and -f then the expressions are evaluated in the order encountered. Since the expression pieces are simply concatenated, please be sure to use intervening semicolons to separate expressions.) -s name=value: Predefines out-of-stream variable @name to have
Thus mlr put -s foo=97 '$column += @foo' is like
mlr put 'begin {@foo = 97} $column += @foo'.
The value part is subject to type-inferencing.
May be specified more than once, e.g. -s name1=value1 -s name2=value2.
Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE -x (default false) Prints records for which {expression} evaluates to false, not true,
i.e. invert the sense of the filter expression. -q Does not include the modified record in the output stream.
Useful for when all desired output is in begin and/or end blocks. -S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done
by the record-readers before filter/put is executed. Supported as no-op pass-through
flags for backward compatibility. -h|--help Show this message. Parser-info options: -w Print warnings about things like uninitialized variables. -W Same as -w, but exit the process if there are any warnings. -p Prints the expressions's AST (abstract syntax tree), which gives full
transparency on the precedence and associativity rules of Miller's grammar,
to stdout. -d Like -p but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -E Echo DSL expression before printing parse-tree -v Same as -E -p. -X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you
only want to look at parser information. Examples:
mlr --from example.csv put '$qr = $quantity * $rate' More example put expressions:
If-statements:
'if ($flag == true) { $quantity *= 10}'
'if ($x > 0.0) { $y=log10($x); $z=sqrt($y) } else {$y = 0.0; $z = 0.0}'
Newly created fields can be read after being written:
'$new_field = $index**2; $qn = $quantity * $new_field'
Regex-replacement:
'$name = sub($name, "http.*com"i, "")'
Regex-capture: 'if ($a =~ "([a-z]+)_([0-9]+)") { $b = "left_\1"; $c = "right_\2" }'
Built-in variables:
'$filename = FILENAME'
Aggregations (use mlr put -q):
'@sum += $x; end {emit @sum}'
'@sum[$shape] += $quantity; end {emit @sum, "shape"}'
'@sum[$shape][$color] += $x; end {emit @sum, "shape", "color"}'
'
@min = min(@min,$x);
@max=max(@max,$x);
end{emitf @min, @max}
' See also https://miller.readthedocs.io/reference-dsl for more context.
regularize¶
Usage: mlr regularize [options] Outputs records sorted lexically ascending by keys. Options: -h|--help Show this message.
remove-empty-columns¶
Usage: mlr remove-empty-columns [options] Omits fields which are empty on every input row. Non-streaming. Options: -h|--help Show this message.
rename¶
Usage: mlr rename [options] {old1,new1,old2,new2,...} Renames specified fields. Options: -r Treat old field names as regular expressions. "ab", "a.*b"
will match any field name containing the substring "ab" or
matching "a.*b", respectively; anchors of the form "^ab$",
"^a.*b$" may be used. New field names may be plain strings,
or may contain capture groups of the form "\1" through
"\9". Wrapping the regex in double quotes is optional, but
is required if you wish to follow it with 'i' to indicate
case-insensitivity. -g Do global replacement within each field name rather than
first-match replacement. -h|--help Show this message. Examples: mlr rename old_name,new_name mlr rename old_name_1,new_name_1,old_name_2,new_name_2 mlr rename -r 'Date_[0-9]+,Date' Rename all such fields to be "Date" mlr rename -r '"Date_[0-9]+",Date' Same mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
reorder¶
Usage: mlr reorder [options] Moves specified names to start of record, or end of record. Options: -e Put specified field names at record end: default is to put them at record start. -f {a,b,c} Field names to reorder. -b {x} Put field names specified with -f before field name specified by {x},
if any. If {x} isn't present in a given record, the specified fields
will not be moved. -a {x} Put field names specified with -f after field name specified by {x},
if any. If {x} isn't present in a given record, the specified fields
will not be moved. -h|--help Show this message. Examples: mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3". mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
repeat¶
Usage: mlr repeat [options] Copies input records to output records multiple times. Options must be exactly one of the following: -n {repeat count} Repeat each input record this many times. -f {field name} Same, but take the repeat count from the specified
field name of each input record. -h|--help Show this message. Example:
echo x=0 | mlr repeat -n 4 then put '$x=urand()' produces:
x=0.488189
x=0.484973
x=0.704983
x=0.147311 Example:
echo a=1,b=2,c=3 | mlr repeat -f b produces:
a=1,b=2,c=3
a=1,b=2,c=3 Example:
echo a=1,b=2,c=3 | mlr repeat -f c produces:
a=1,b=2,c=3
a=1,b=2,c=3
a=1,b=2,c=3
reshape¶
Usage: mlr reshape [options] Wide-to-long options:
-i {input field names} -o {key-field name,value-field name}
-r {input field regex} -o {key-field name,value-field name}
These pivot/reshape the input data such that the input fields are removed
and separate records are emitted for each key/value pair.
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`. Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
Note: this does not work with tail -f; it produces output records only after
all input records have been read. Examples:
Input file "wide.txt":
time X Y
2009-01-01 0.65473572 2.4520609
2009-01-02 -0.89248112 0.2154713
2009-01-03 0.98012375 1.3179287
mlr --pprint reshape -i X,Y -o item,value wide.txt
time item value
2009-01-01 X 0.65473572
2009-01-01 Y 2.4520609
2009-01-02 X -0.89248112
2009-01-02 Y 0.2154713
2009-01-03 X 0.98012375
2009-01-03 Y 1.3179287
mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
time item value
2009-01-01 X 0.65473572
2009-01-01 Y 2.4520609
2009-01-02 X -0.89248112
2009-01-02 Y 0.2154713
2009-01-03 X 0.98012375
2009-01-03 Y 1.3179287
Input file "long.txt":
time item value
2009-01-01 X 0.65473572
2009-01-01 Y 2.4520609
2009-01-02 X -0.89248112
2009-01-02 Y 0.2154713
2009-01-03 X 0.98012375
2009-01-03 Y 1.3179287
mlr --pprint reshape -s item,value long.txt
time X Y
2009-01-01 0.65473572 2.4520609
2009-01-02 -0.89248112 0.2154713
2009-01-03 0.98012375 1.3179287 See also mlr nest.
sample¶
Usage: mlr sample [options] Reservoir sampling (subsampling without replacement), optionally by category. See also mlr bootstrap and mlr shuffle. Options: -g {a,b,c} Optional: group-by-field names for samples, e.g. a,b,c. -k {k} Required: number of records to output in total, or by group if using -g. -h|--help Show this message.
sec2gmtdate¶
Usage: ../c/mlr sec2gmtdate {comma-separated list of field names} Replaces a numeric field representing seconds since the epoch with the corresponding GMT year-month-day timestamp; leaves non-numbers as-is. This is nothing more than a keystroke-saver for the sec2gmtdate function:
../c/mlr sec2gmtdate time1,time2 is the same as
../c/mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
sec2gmt¶
Usage: mlr sec2gmt [options] {comma-separated list of field names} Replaces a numeric field representing seconds since the epoch with the corresponding GMT timestamp; leaves non-numbers as-is. This is nothing more than a keystroke-saver for the sec2gmt function:
mlr sec2gmt time1,time2 is the same as
mlr put '$time1 = sec2gmt($time1); $time2 = sec2gmt($time2)' Options: -1 through -9: format the seconds using 1..9 decimal places, respectively. --millis Input numbers are treated as milliseconds since the epoch. --micros Input numbers are treated as microseconds since the epoch. --nanos Input numbers are treated as nanoseconds since the epoch. -h|--help Show this message.
seqgen¶
Usage: mlr seqgen [options] Passes input records directly to output. Most useful for format conversion. Produces a sequence of counters. Discards the input record stream. Produces output as specified by the options Options: -f {name} (default "i") Field name for counters. --start {value} (default 1) Inclusive start value. --step {value} (default 1) Step value. --stop {value} (default 100) Inclusive stop value. -h|--help Show this message. Start, stop, and/or step may be floating-point. Output is integer if start, stop, and step are all integers. Step may be negative. It may not be zero unless start == stop.
shuffle¶
Usage: mlr shuffle [options] Outputs records randomly permuted. No output records are produced until all input records are read. See also mlr bootstrap and mlr sample. Options: -h|--help Show this message.
skip-trivial-records¶
Usage: mlr skip-trivial-records [options] Passes through all records except those with zero fields, or those for which all fields have empty value. Options: -h|--help Show this message.
sort¶
Usage: mlr sort {flags} Sorts records primarily by the first specified field, secondarily by the second field, and so on. (Any records not having all specified sort keys will appear at the end of the output, in the order they were encountered, regardless of the specified sort order.) The sort is stable: records that compare equal will sort in the order they were encountered in the input record stream. Options: -f {comma-separated field names} Lexical ascending -r {comma-separated field names} Lexical descending -c {comma-separated field names} Case-folded lexical ascending -cr {comma-separated field names} Case-folded lexical descending -n {comma-separated field names} Numerical ascending; nulls sort last -nf {comma-separated field names} Same as -n -nr {comma-separated field names} Numerical descending; nulls sort first -t {comma-separated field names} Natural ascending -tr|-rt {comma-separated field names} Natural descending -h|--help Show this message. Example:
mlr sort -f a,b -nr x,y,z which is the same as:
mlr sort -f a -f b -nr x -nr y -nr z
sort-within-records¶
Usage: mlr sort-within-records [options] Outputs records sorted lexically ascending by keys. Options: -r Recursively sort subobjects/submaps, e.g. for JSON input. -h|--help Show this message.
sparsify¶
Usage: mlr sparsify [options] Unsets fields for which the key is the empty string (or, optionally, another specified value). Only makes sense with output format not being CSV or TSV. Options: -s {filler string} What values to remove. Defaults to the empty string. -f {a,b,c} Specify field names to be operated on; any other fields won't be
modified. The default is to modify all fields. -h|--help Show this message. Example: if input is a=1,b=,c=3 then output is a=1,c=3.
split¶
Usage: mlr split [options] {filename} Options: -n {n}: Cap file sizes at N records. -m {m}: Produce M files, round-robining records among them. -g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c. Exactly one of -m, -n, or -g must be supplied. --prefix {p} Specify filename prefix; default "split". --suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv". -a Append to existing file(s), if any, rather than overwriting. -v Send records along to downstream verbs as well as splitting to files. -e Do NOT URL-escape names of output files. -j {J} Use string J to join filename parts; default "_". -h|--help Show this message. Any of the output-format command-line flags (see mlr -h). For example, using
mlr --icsv --from myfile.csv split --ojson -n 1000 the input is CSV, but the output files are JSON. Examples: Suppose myfile.csv has 1,000,000 records. 100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
mlr --csv --from myfile.csv split -n 10000 10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
mlr --csv --from myfile.csv split -m 10 Same, but with JSON output.
mlr --csv --from myfile.csv split -m 10 -o json Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat Same, but written to the /tmp/ directory.
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
mlr --csv --from myfile.csv split -g shape If the color field has values yellow and green, and the shape field has values triangle and square, then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
mlr --csv --from myfile.csv split -g color,shape See also the "tee" DSL function which lets you do more ad-hoc customization.
ssub¶
Usage: mlr ssub [options] Replaces old string with new string in specified field(s), without regex support for the old string, like the `ssub` DSL function. See also the `gsub` and `sub` verbs. Options: -f {a,b,c} Field names to convert. -h|--help Show this message.
stats1¶
Usage: mlr stats1 [options] Computes univariate statistics for one or more given fields, accumulated across the input record stream. Options: -a {sum,count,...} Names of accumulators: one or more of:
median This is the same as p50
p10 p25.2 p50 p98 p100 etc.
count Count instances of fields
null_count Count number of empty-string/JSON-null instances per field
distinct_count Count number of distinct values per field
mode Find most-frequently-occurring values for fields; first-found wins tie
antimode Find least-frequently-occurring values for fields; first-found wins tie
sum Compute sums of specified fields
mean Compute averages (sample means) of specified fields
mad Compute mean absolute deviation
var Compute sample variance of specified fields
stddev Compute sample standard deviation of specified fields
meaneb Estimate error bars for averages (assuming no sample autocorrelation)
skewness Compute sample skewness of specified fields
kurtosis Compute sample kurtosis of specified fields
min Compute minimum values of specified fields
max Compute maximum values of specified fields
minlen Compute minimum string-lengths of specified fields
maxlen Compute maximum string-lengths of specified fields -f {a,b,c} Value-field names on which to compute statistics --fr {regex} Regex for value-field names on which to compute statistics
(compute statistics on values in all field names matching regex --fx {regex} Inverted regex for value-field names on which to compute statistics
(compute statistics on values in all field names not matching regex) -g {d,e,f} Optional group-by-field names --gr {regex} Regex for optional group-by-field names
(group by values in field names matching regex) --gx {regex} Inverted regex for optional group-by-field names
(group by values in field names not matching regex) --grfx {regex} Shorthand for --gr {regex} --fx {that same regex} -i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n"); -s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size Example: mlr stats1 -a count,mode -f size -g shape Example: mlr stats1 -a count,mode --fr '^[a-h].*$' --gr '^k.*$'
This computes count and mode statistics on all field names beginning
with a through h, grouped by all field names starting with k. Notes: * p50 and median are synonymous. * min and max output the same results as p0 and p100, respectively, but use
less memory. * String-valued data make sense unless arithmetic on them is required,
e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
numbers are less than strings. * count and mode allow text input; the rest require numeric input.
In particular, 1 and 1.0 are distinct text for count and mode. * When there are mode ties, the first-encountered datum wins.
stats2¶
Usage: mlr stats2 [options] Computes bivariate statistics for one or more given field-name pairs, accumulated across the input record stream. -a {linreg-ols,corr,...} Names of accumulators: one or more of:
linreg-ols Linear regression using ordinary least squares
linreg-pca Linear regression using principal component analysis
r2 Quality metric for linreg-ols (linreg-pca emits its own)
logireg Logistic regression
corr Sample correlation
cov Sample covariance
covx Sample-covariance matrix -f {a,b,c,d} Value-field name-pairs on which to compute statistics.
There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. -s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for
linreg-ols, linreg-pca, and logireg. Only one of -s or --fit may be used. Example: mlr stats2 -a linreg-pca -f x,y Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape Example: mlr stats2 -a corr -f x,y
step¶
Usage: mlr step [options] Computes values dependent on earlier/later records, optionally grouped by category. Options: -a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
counter Count instances of field(s) between successive records
delta Compute differences in field(s) between successive records
ewma Exponentially weighted moving average over successive records
from-first Compute differences in field(s) from first record
ratio Compute ratios in field(s) between successive records
rprod Compute running products of field(s) between successive records
rsum Compute running sums of field(s) between successive records
shift Alias for shift_lag
shift_lag Include value(s) in field(s) from the previous record, if any
shift_lead Include value(s) in field(s) from the next record, if any
slwin Sliding-window averages over m records back and n forward. E.g. slwin_7_2 for 7 back and 2 forward. -f {a,b,c} Value-field names on which to compute statistics -g {d,e,f} Optional group-by-field names -F Computes integerable things (e.g. counter) in floating point.
As of Miller 6 this happens automatically, but the flag is accepted
as a no-op for backward compatibility with Miller 5 and below. -d {x,y,z} Weights for EWMA. 1 means current sample gets all weight (no
smoothing), near under 1 is light smoothing, near over 0 is
heavy smoothing. Multiple weights may be specified, e.g.
"mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
is "-d 0.5". -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
the -d values. If supplied, the number of -o values must be the same
as the number of -d values. -h|--help Show this message. Examples:
mlr step -a rsum -f request_size
mlr step -a delta -f request_size -g hostname
mlr step -a ewma -d 0.1,0.9 -f x,y
mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
mlr step -a slwin_9_0,slwin_0_9 -f x Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#filter or https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average for more information on EWMA.
sub¶
Usage: mlr sub [options] Replaces old string with new string in specified field(s), with regex support for the old string and not handling multiple matches, like the `sub` DSL function. See also the `gsub` and `ssub` verbs. Options: -f {a,b,c} Field names to convert. -h|--help Show this message.
summary¶
Usage: mlr summary [options] Show summary statistics about the input data. All summarizers:
field_type string, int, etc. -- if a column has mixed types, all encountered types are printed
count +1 for every instance of the field across all records in the input record stream
null_count count of field values either empty string or JSON null
distinct_count count of distinct values for the field
mode most-frequently-occurring value for the field
sum sum of field values
mean mean of the field values
stddev standard deviation of the field values
var variance of the field values
skewness skewness of the field values
minlen length of shortest string representation for the field
maxlen length of longest string representation for the field
min minimum field value
p25 first-quartile field value
median median field value
p75 third-quartile field value
max maximum field value
iqr interquartile range: p75 - p25
lof lower outer fence: p25 - 3.0 * iqr
lif lower inner fence: p25 - 1.5 * iqr
uif upper inner fence: p75 + 1.5 * iqr
uof upper outer fence: p75 + 3.0 * iqr Default summarizers:
field_type count mean min max null_count distinct_count Notes: * min, p25, median, p75, and max work for strings as well as numbers * Distinct-counts are computed on string representations -- so 4.1 and 4.10 are counted as distinct here. * If the mode is not unique in the input data, the first-encountered value is reported as the mode. Options: -a {mean,sum,etc.} Use only the specified summarizers. -x {mean,sum,etc.} Use all summarizers, except the specified ones. --all Use all available summarizers. --transpose Show output with field names as column names.. -h|--help Show this message.
tac¶
Usage: mlr tac [options] Prints records in reverse order from the order in which they were encountered. Options: -h|--help Show this message.
tail¶
Usage: mlr tail [options] Passes through the last n records, optionally by category. Options: -g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c. -n {n} Head-count to print. Default 10. -h|--help Show this message.
tee¶
Usage: mlr tee [options] {filename} Options: -a Append to existing file, if any, rather than overwriting. -p Treat filename as a pipe-to command. Any of the output-format command-line flags (see mlr -h). Example: using
mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ... the input is CSV, the output is pretty-print tabular, but the tee-file output is written in JSON format. -h|--help Show this message.
template¶
Usage: mlr template [options] Places input-record fields in the order specified by list of column names. If the input record is missing a specified field, it will be filled with the fill-with. If the input record possesses an unspecified field, it will be discarded. Options:
-f {a,b,c} Comma-separated field names for template, e.g. a,b,c.
-t {filename} CSV file whose header line will be used for template. --fill-with {filler string} What to fill absent fields with. Defaults to the empty string. -h|--help Show this message. Example: * Specified fields are a,b,c. * Input record is c=3,a=1,f=6. * Output record is a=1,b=,c=3.
top¶
Usage: mlr top [options] -f {a,b,c} Value-field names for top counts. -g {d,e,f} Optional group-by-field names for top counts. -n {count} How many records to print per category; default 1. -a Print all fields for top-value records; default is
to print only value and group-by fields. Requires a single
value-field name only. --min Print top smallest values; default is top largest values. -F Keep top values as floats even if they look like integers. -o {name} Field name for output indices. Default "top_idx".
This is ignored if -a is used. Prints the n records with smallest/largest values at specified fields, optionally by category. If -a is given, then the top records are emitted with the same fields as they appeared in the input. Without -a, only fields from -f, fields from -g, and the top-index field are emitted. For more information please see https://miller.readthedocs.io/en/latest/reference-verbs#top
utf8-to-latin1¶
Usage: mlr utf8-to-latin1, with no options. Recursively converts record strings from Latin-1 to UTF-8. For field-level control, please see the utf8_to_latin1 DSL function. Options: -h|--help Show this message.
unflatten¶
Usage: mlr unflatten [options] Reverses flatten. Example: field with name 'a.b.c' and value 4 becomes name 'a' and value '{"b": { "c": 4 }}'. Options: -f {a,b,c} Comma-separated list of field names to unflatten (default all). -s {string} Separator, defaulting to mlr --flatsep value. -h|--help Show this message.
uniq¶
Usage: mlr uniq [options] Prints distinct values for specified field names. With -c, same as count-distinct. For uniq, -f is a synonym for -g. Options: -g {d,e,f} Group-by-field names for uniq counts. -x {a,b,c} Field names to exclude for uniq: use each record's others instead. -c Show repeat counts in addition to unique values. -n Show only the number of distinct values. -o {name} Field name for output count. Default "count". -a Output each unique record only once. Incompatible with -g.
With -c, produces unique records, with repeat counts for each.
With -n, produces only one record which is the unique-record count.
With neither -c nor -n, produces unique records.
unspace¶
Usage: mlr unspace [options] Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output. Options: -f {x} Replace spaces with specified filler character. -k Unspace only keys, not keys and values. -v Unspace only values, not keys and values. -h|--help Show this message.
unsparsify¶
Usage: mlr unsparsify [options] Prints records with the union of field names over all input records. For field names absent in a given record but present in others, fills in a value. This verb retains all input before producing any output. Options: --fill-with {filler string} What to fill absent fields with. Defaults to
the empty string. -f {a,b,c} Specify field names to be operated on. Any other fields won't be
modified, and operation will be streaming. -h|--help Show this message. Example: if the input is two records, one being 'a=1,b=2' and the other being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and ’a=,b=3,c=4'.
FUNCTIONS FOR FILTER/PUT¶
abs¶
(class=math #args=1) Absolute value.
acos¶
(class=math #args=1) Inverse trigonometric cosine.
acosh¶
(class=math #args=1) Inverse hyperbolic cosine.
antimode¶
(class=stats #args=1) Returns the least frequently occurring value in an array or map. Returns error for non-array/non-map types. Values are stringified for comparison, so for example string "1" and integer 1 are not distinct. In cases of ties, first-found wins. Examples: antimode([3,3,4,4,4]) is 3 antimode([3,3,4,4]) is 3
any¶
(class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, yields a boolean true if the argument function returns true for any array/map element, false otherwise. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: any([10,20,30], func(e) {return $index == e}) Map example: any({"a": "foo", "b": "bar"}, func(k,v) {return $[k] == v})
append¶
(class=collections #args=2) Appends second argument to end of first argument, which must be an array.
apply¶
(class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, applies the function to each element of the array/map. For arrays, the function should take one argument, for array element; it should return a new element. For maps, it should take two arguments, for map-element key and value; it should return a new key-value pair (i.e. a single-entry map). Examples: Array example: apply([1,2,3,4,5], func(e) {return e ** 3}) returns [1, 8, 27, 64, 125]. Map example: apply({"a":1, "b":3, "c":5}, func(k,v) {return {toupper(k): v ** 2}}) returns {"A": 1, "B":9, "C": 25}",
arrayify¶
(class=collections #args=1) Walks through a nested map/array, converting any map with consecutive keys "1", "2", ... into an array. Useful to wrap the output of unflatten.
asin¶
(class=math #args=1) Inverse trigonometric sine.
asinh¶
(class=math #args=1) Inverse hyperbolic sine.
asserting_absent¶
(class=typing #args=1) Aborts with an error if is_absent on the argument returns false, else returns its argument.
asserting_array¶
(class=typing #args=1) Aborts with an error if is_array on the argument returns false, else returns its argument.
asserting_bool¶
(class=typing #args=1) Aborts with an error if is_bool on the argument returns false, else returns its argument.
asserting_boolean¶
(class=typing #args=1) Aborts with an error if is_boolean on the argument returns false, else returns its argument.
asserting_empty¶
(class=typing #args=1) Aborts with an error if is_empty on the argument returns false, else returns its argument.
asserting_empty_map¶
(class=typing #args=1) Aborts with an error if is_empty_map on the argument returns false, else returns its argument.
asserting_error¶
(class=typing #args=1) Aborts with an error if is_error on the argument returns false, else returns its argument.
asserting_float¶
(class=typing #args=1) Aborts with an error if is_float on the argument returns false, else returns its argument.
asserting_int¶
(class=typing #args=1) Aborts with an error if is_int on the argument returns false, else returns its argument.
asserting_map¶
(class=typing #args=1) Aborts with an error if is_map on the argument returns false, else returns its argument.
asserting_nonempty_map¶
(class=typing #args=1) Aborts with an error if is_nonempty_map on the argument returns false, else returns its argument.
asserting_not_array¶
(class=typing #args=1) Aborts with an error if is_not_array on the argument returns false, else returns its argument.
asserting_not_empty¶
(class=typing #args=1) Aborts with an error if is_not_empty on the argument returns false, else returns its argument.
asserting_not_map¶
(class=typing #args=1) Aborts with an error if is_not_map on the argument returns false, else returns its argument.
asserting_not_null¶
(class=typing #args=1) Aborts with an error if is_not_null on the argument returns false, else returns its argument.
asserting_null¶
(class=typing #args=1) Aborts with an error if is_null on the argument returns false, else returns its argument.
asserting_numeric¶
(class=typing #args=1) Aborts with an error if is_numeric on the argument returns false, else returns its argument.
asserting_present¶
(class=typing #args=1) Aborts with an error if is_present on the argument returns false, else returns its argument.
asserting_string¶
(class=typing #args=1) Aborts with an error if is_string on the argument returns false, else returns its argument.
atan¶
(class=math #args=1) One-argument arctangent.
atan2¶
(class=math #args=2) Two-argument arctangent.
atanh¶
(class=math #args=1) Inverse hyperbolic tangent.
bitcount¶
(class=arithmetic #args=1) Count of 1-bits.
boolean¶
(class=conversion #args=1) Convert int/float/bool/string to boolean.
capitalize¶
(class=string #args=1) Convert string's first character to uppercase.
cbrt¶
(class=math #args=1) Cube root.
ceil¶
(class=math #args=1) Ceiling: nearest integer at or above.
clean_whitespace¶
(class=string #args=1) Same as collapse_whitespace and strip, followed by type inference.
collapse_whitespace¶
(class=string #args=1) Strip repeated whitespace from string.
concat¶
(class=collections #args=variadic) Returns the array concatenation of the arguments. Non-array arguments are treated as single-element arrays. Examples: concat(1,2,3) is [1,2,3] concat([1,2],3) is [1,2,3] concat([1,2],[3]) is [1,2,3]
contains¶
(class=string #args=2) Returns true if the first argument contains the second as a substring. This is like saying `index(arg1, arg2) >= 0`but with less keystroking. Examples: contains("abcde", "e") gives true contains("abcde", "x") gives false contains(12345, 34) gives true contains("forêt", "ê") gives true
cos¶
(class=math #args=1) Trigonometric cosine.
cosh¶
(class=math #args=1) Hyperbolic cosine.
count¶
(class=stats #args=1) Returns the length of an array or map. Returns error for non-array/non-map types. Examples: count([7,8,9]) is 3 count({"a":7,"b":8,"c":9}) is 3
depth¶
(class=collections #args=1) Prints maximum depth of map/array. Scalars have depth 0.
dhms2fsec¶
(class=time #args=1) Recovers floating-point seconds as in dhms2fsec("5d18h53m20.250000s") = 500000.250000
dhms2sec¶
(class=time #args=1) Recovers integer seconds as in dhms2sec("5d18h53m20s") = 500000
distinct_count¶
(class=stats #args=1) Returns the number of disinct values in an array or map. Returns error for non-array/non-map types. Values are stringified for comparison, so for example string "1" and integer 1 are not distinct. Examples: distinct_count([7,8,9,7]) is 3 distinct_count([1,"1"]) is 1 distinct_count([1,1.0]) is 2
erf¶
(class=math #args=1) Error function.
erfc¶
(class=math #args=1) Complementary error function.
every¶
(class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, yields a boolean true if the argument function returns true for every array/map element, false otherwise. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: every(["a", "b", "c"], func(e) {return $[e] >= 0}) Map example: every({"a": "foo", "b": "bar"}, func(k,v) {return $[k] == v})
exec¶
(class=system #args=variadic) '$output = exec( "command", ["arg1", "arg2"], {"env": ["ENV_VAR=ENV_VALUE", "ENV_VAR2=ENV_VALUE2"], "dir": "/tmp/run_command_here", "stdin_string": "this is input fed to program", "combined_output": true )' Run a command via executable, path, args and environment, yielding its stdout minus final carriage return. Example: exec("echo", ["I don't do", "$SHELL things"], {"env": "SHELL=sh"}) outputs "I don't do $SHELL things"
exp¶
(class=math #args=1) Exponential function e**x.
expm1¶
(class=math #args=1) e**x - 1.
flatten¶
(class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. With two arguments, the first argument is a map (maybe $*) and the second argument is the flatten separator. With three arguments, the first argument is prefix, the second is the flatten separator, and the third argument is a map; flatten($*, ".") is the same as flatten("", ".", $*). See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Examples: flatten({"a":[1,2],"b":3}, ".") is {"a.1": 1, "a.2": 2, "b": 3}. flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}.
float¶
(class=conversion #args=1) Convert int/float/bool/string to float.
floor¶
(class=math #args=1) Floor: nearest integer at or below.
fmtifnum¶
(class=conversion #args=2) Identical to fmtnum, except returns the first argument as-is if the output would be an error. Examples: fmtifnum(3.4, "%.6f") gives 3.400000" fmtifnum("abc", "%.6f") gives abc" $* = fmtifnum($*, "%.6f") formats numeric fields in the current record, leaving non-numeric ones alone
fmtnum¶
(class=conversion #args=2) Convert int/float/bool to string using printf-style format string (https://pkg.go.dev/fmt), e.g. '$s = fmtnum($n, "%08d")' or '$t = fmtnum($n, "%.6e")'. Miller-specific extension: "%_d" and "%_f" for comma-separated thousands. This function recurses on array and map values. Examples: $y = fmtnum($x, "%.6f") $o = fmtnum($n, "%d") $o = fmtnum($n, "%12d") $y = fmtnum($x, "%.6_f") $o = fmtnum($n, "%_d") $o = fmtnum($n, "%12_d")
fold¶
(class=higher-order-functions #args=3) Given a map or array as first argument and a function as second argument, accumulates entries into a final output -- for example, sum or product. For arrays, the function should take two arguments, for accumulated value and array element. For maps, it should take four arguments, for accumulated key and value, and map-element key and value; it should return the updated accumulator as a new key-value pair (i.e. a single-entry map). The start value for the accumulator is taken from the third argument. Examples: Array example: fold([1,2,3,4,5], func(acc,e) {return acc + e**3}, 10000) returns 10225. Map example: fold({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum": accv+ev**2}}, {"sum":10000}) returns 10035.
format¶
(class=string #args=variadic) Using first argument as format string, interpolate remaining arguments in place of each "{}" in the format string. Too-few arguments are treated as the empty string; too-many arguments are discarded. Examples: format("{}:{}:{}", 1,2) gives "1:2:". format("{}:{}:{}", 1,2,3) gives "1:2:3". format("{}:{}:{}", 1,2,3,4) gives "1:2:3".
fsec2dhms¶
(class=time #args=1) Formats floating-point seconds as in fsec2dhms(500000.25) = "5d18h53m20.250000s"
fsec2hms¶
(class=time #args=1) Formats floating-point seconds as in fsec2hms(5000.25) = "01:23:20.250000"
get_keys¶
(class=collections #args=1) Returns array of keys of map or array
get_values¶
(class=collections #args=1) Returns array of values of map or array -- in the latter case, returns a copy of the array
gmt2localtime¶
(class=time #args=1,2) Convert from a GMT-time string to a local-time string. Consulting $TZ unless second argument is supplied. Examples: gmt2localtime("1999-12-31T22:00:00Z") = "2000-01-01 00:00:00" with TZ="Asia/Istanbul" gmt2localtime("1999-12-31T22:00:00Z", "Asia/Istanbul") = "2000-01-01 00:00:00"
gmt2nsec¶
(class=time #args=1) Parses GMT timestamp as integer nanoseconds since the epoch. Example: gmt2nsec("2001-02-03T04:05:06Z") = 981173106000000000
gmt2sec¶
(class=time #args=1) Parses GMT timestamp as integer seconds since the epoch. Example: gmt2sec("2001-02-03T04:05:06Z") = 981173106
gssub¶
(class=string #args=3) Like gsub but does no regexing. No characters are special. Example: gssub("ab.d.fg", ".", "X") gives "abXdXfg"
gsub¶
(class=string #args=3) '$name = gsub($name, "old", "new")': replace all, with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to gsub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: gsub("ababab", "ab", "XY") gives "XYXYXY" gsub("abc.def", ".", "X") gives "XXXXXXX" gsub("abc.def", "\.", "X") gives "abcXdef" gsub("abcdefg", "[ce]", "X") gives "abXdXfg" gsub("prefix4529:suffix8567", "(....ix)([0-9]+)", "[\1 : \2]") gives "[prefix : 4529]:[suffix : 8567]"
haskey¶
(class=collections #args=2) True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or 'haskey(mymap, mykey)', or true/false if array index is in bounds / out of bounds. Error if 1st argument is not a map or array. Note -n..-1 alias to 1..n in Miller arrays.
hexfmt¶
(class=conversion #args=1) Convert int to hex string, e.g. 255 to "0xff".
hms2fsec¶
(class=time #args=1) Recovers floating-point seconds as in hms2fsec("01:23:20.250000") = 5000.250000
hms2sec¶
(class=time #args=1) Recovers integer seconds as in hms2sec("01:23:20") = 5000
hostname¶
(class=system #args=0) Returns the hostname as a string.
index¶
(class=string #args=2) Returns the index (1-based) of the second argument within the first. Returns -1 if the second argument isn't a substring of the first. Stringifies non-string inputs. Uses UTF-8 encoding to count characters, not bytes. Examples: index("abcde", "e") gives 5 index("abcde", "x") gives -1 index(12345, 34) gives 3 index("forêt", "t") gives 5
int¶
(class=conversion #args=1,2) Convert int/float/bool/string to int. If the second argument is omitted and the first argument is a string, base is inferred from the first argument's prefix. If the second argument is provided and the first argument is a string, the second argument is used as the base. If the second argument is provided and the first argument is not a string, the second argument is ignored. Examples: int("345") gives decimal 345 (base-10/decimal input is inferred) int("0xff") gives decimal 255 (base-16/hexadecimal input is inferred) int("0377") gives decimal 255 (base-8/octal input is inferred) int("0b11010011") gives decimal 211 which is hexadecimal 0xd3 (base-2/binary input is inferred) int("0377", 10) gives decimal 377 int(345, 16) gives decimal 345 int(string(345), 16) gives decimal 837
invqnorm¶
(class=math #args=1) Inverse of normal cumulative distribution function. Note that invqorm(urand()) is normally distributed.
is_absent¶
(class=typing #args=1) False if field is present in input, true otherwise
is_array¶
(class=typing #args=1) True if argument is an array.
is_bool¶
(class=typing #args=1) True if field is present with boolean value. Synonymous with is_boolean.
is_boolean¶
(class=typing #args=1) True if field is present with boolean value. Synonymous with is_bool.
is_empty¶
(class=typing #args=1) True if field is present in input with empty string value, false otherwise.
is_empty_map¶
(class=typing #args=1) True if argument is a map which is empty.
is_error¶
(class=typing #args=1) True if if argument is an error, such as taking string length of an integer.
is_float¶
(class=typing #args=1) True if field is present with value inferred to be float
is_int¶
(class=typing #args=1) True if field is present with value inferred to be int
is_map¶
(class=typing #args=1) True if argument is a map.
is_nan¶
(class=typing #args=1) True if the argument is the NaN (not-a-number) floating-point value. Note that NaN has the property that NaN != NaN, so you need 'is_nan(x)' rather than 'x == NaN'.
is_nonempty_map¶
(class=typing #args=1) True if argument is a map which is non-empty.
is_not_array¶
(class=typing #args=1) True if argument is not an array.
is_not_empty¶
(class=typing #args=1) True if field is present in input with non-empty value, false otherwise
is_not_map¶
(class=typing #args=1) True if argument is not a map.
is_not_null¶
(class=typing #args=1) False if argument is null (empty, absent, or JSON null), true otherwise.
is_null¶
(class=typing #args=1) True if argument is null (empty, absent, or JSON null), false otherwise.
is_numeric¶
(class=typing #args=1) True if field is present with value inferred to be int or float
is_present¶
(class=typing #args=1) True if field is present in input, false otherwise.
is_string¶
(class=typing #args=1) True if field is present with string (including empty-string) value
joink¶
(class=conversion #args=2) Makes string from map/array keys. First argument is map/array; second is separator string. Examples: joink({"a":3,"b":4,"c":5}, ",") = "a,b,c". joink([1,2,3], ",") = "1,2,3".
joinkv¶
(class=conversion #args=3) Makes string from map/array key-value pairs. First argument is map/array; second is pair-separator string; third is field-separator string. Mnemonic: the "=" comes before the "," in the output and in the arguments to joinkv. Examples: joinkv([3,4,5], "=", ",") = "1=3,2=4,3=5" joinkv({"a":3,"b":4,"c":5}, ":", ";") = "a:3;b:4;c:5"
joinv¶
(class=conversion #args=2) Makes string from map/array values. First argument is map/array; second is separator string. Examples: joinv([3,4,5], ",") = "3,4,5" joinv({"a":3,"b":4,"c":5}, ",") = "3,4,5"
json_parse¶
(class=collections #args=1) Converts value from JSON-formatted string.
json_stringify¶
(class=collections #args=1,2) Converts value to JSON-formatted string. Default output is single-line. With optional second boolean argument set to true, produces multiline output.
kurtosis¶
(class=stats #args=1) Returns the sample kurtosis of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: kurtosis([4,5,9,10,11]) is -1.6703688
latin1_to_utf8¶
(class=string #args=1) Tries to convert Latin-1-encoded string to UTF-8-encoded string. If argument is array or map, recurses into it. Examples: $y = latin1_to_utf8($x) $* = latin1_to_utf8($*)
leafcount¶
(class=collections #args=1) Counts total number of terminal values in map/array. For single-level map/array, same as length.
leftpad¶
(class=string #args=3) Left-pads first argument to at most the specified length (second, integer argument) using specified pad value (third, string argument). If the first argument is not a string, it will be stringified first. Examples: leftpad("abcdefg", 10 , "*") gives "***abcdefg". leftpad("abcdefg", 10 , "XY") gives "XYabcdefg". leftpad("1234567", 10 , "0") gives "0001234567".
length¶
(class=collections #args=1) Counts number of top-level entries in array/map. Scalars have length 1.
localtime2gmt¶
(class=time #args=1,2) Convert from a local-time string to a GMT-time string. Consults $TZ unless second argument is supplied. Examples: localtime2gmt("2000-01-01 00:00:00") = "1999-12-31T22:00:00Z" with TZ="Asia/Istanbul" localtime2gmt("2000-01-01 00:00:00", "Asia/Istanbul") = "1999-12-31T22:00:00Z"
localtime2nsec¶
(class=time #args=1,2) Parses local timestamp as integer nanoseconds since the epoch. Consults $TZ environment variable, unless second argument is supplied. Examples: localtime2nsec("2001-02-03 04:05:06") = 981165906000000000 with TZ="Asia/Istanbul" localtime2nsec("2001-02-03 04:05:06", "Asia/Istanbul") = 981165906000000000"
localtime2sec¶
(class=time #args=1,2) Parses local timestamp as integer seconds since the epoch. Consults $TZ environment variable, unless second argument is supplied. Examples: localtime2sec("2001-02-03 04:05:06") = 981165906 with TZ="Asia/Istanbul" localtime2sec("2001-02-03 04:05:06", "Asia/Istanbul") = 981165906"
log¶
(class=math #args=1) Natural (base-e) logarithm.
log10¶
(class=math #args=1) Base-10 logarithm.
log1p¶
(class=math #args=1) log(1-x).
logifit¶
(class=math #args=3) Given m and b from logistic regression, compute fit: $yhat=logifit($x,$m,$b).
lstrip¶
(class=string #args=1) Strip leading whitespace from string.
madd¶
(class=arithmetic #args=3) a + b mod m (integers)
mapdiff¶
(class=collections #args=variadic) With 0 args, returns empty map. With 1 arg, returns copy of arg. With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
mapexcept¶
(class=collections #args=variadic) Returns a map with keys from remaining arguments, if any, unset. Remaining arguments can be strings or arrays of string. E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}' and 'mapexcept({1:2,3:4,5:6}, [1, 5, 7])' is '{3:4}'.
mapselect¶
(class=collections #args=variadic) Returns a map with only keys from remaining arguments set. Remaining arguments can be strings or arrays of string. E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}' and 'mapselect({1:2,3:4,5:6}, [1, 5, 7])' is '{1:2,5:6}'.
mapsum¶
(class=collections #args=variadic) With 0 args, returns empty map. With >= 1 arg, returns a map with key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
max¶
(class=math #args=variadic) Max of n numbers; null loses. The min and max functions also recurse into arrays and maps, so they can be used to get min/max stats on array/map values.
maxlen¶
(class=stats #args=1) Returns the maximum string length of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: maxlen(["año", "alto"]) is 4
md5¶
(class=hashing #args=1) MD5 hash.
mean¶
(class=stats #args=1) Returns the arithmetic mean of values in an array or map. Returns empty string AKA void for empty array/map; returns error for non-array/non-map types. Example: mean([4,5,7,10]) is 6.5
meaneb¶
(class=stats #args=1) Returns the error bar for arithmetic mean of values in an array or map, assuming the values are independent and identically distributed. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: meaneb([4,5,7,10]) is 1.3228756
median¶
(class=stats #args=1,2) Returns the median of values in an array or map. Returns empty string AKA void for empty array/map; returns error for non-array/non-map types. Please see the percentiles function for information on optional flags, and on performance for large inputs. Examples: median([3,4,5,6,9,10]) is 6 median([3,4,5,6,9,10],{"interpolate_linearly":true}) is 5.5 median(["abc", "def", "ghi", "ghi"]) is "ghi"
mexp¶
(class=arithmetic #args=3) a ** b mod m (integers)
min¶
(class=math #args=variadic) Min of n numbers; null loses. The min and max functions also recurse into arrays and maps, so they can be used to get min/max stats on array/map values.
minlen¶
(class=stats #args=1) Returns the minimum string length of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: minlen(["año", "alto"]) is 3
mmul¶
(class=arithmetic #args=3) a * b mod m (integers)
mode¶
(class=stats #args=1) Returns the most frequently occurring value in an array or map. Returns error for non-array/non-map types. Values are stringified for comparison, so for example string "1" and integer 1 are not distinct. In cases of ties, first-found wins. Examples: mode([3,3,4,4,4]) is 4 mode([3,3,4,4]) is 3
msub¶
(class=arithmetic #args=3) a - b mod m (integers)
nsec2gmt¶
(class=time #args=1,2) Formats integer nanoseconds since epoch as GMT timestamp. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part. Examples: nsec2gmt(1234567890000000000) = "2009-02-13T23:31:30Z" nsec2gmt(1234567890123456789) = "2009-02-13T23:31:30Z" nsec2gmt(1234567890123456789, 6) = "2009-02-13T23:31:30.123456Z"
nsec2gmtdate¶
(class=time #args=1) Formats integer nanoseconds since epoch as GMT timestamp with year-month-date. Leaves non-numbers as-is. Example: sec2gmtdate(1440768801700000000) = "2015-08-28".
nsec2localdate¶
(class=time #args=1,2) Formats integer nanoseconds since epoch as local timestamp with year-month-date. Leaves non-numbers as-is. Consults $TZ environment variable unless second argument is supplied. Examples: nsec2localdate(1440768801700000000) = "2015-08-28" with TZ="Asia/Istanbul" nsec2localdate(1440768801700000000, "Asia/Istanbul") = "2015-08-28"
nsec2localtime¶
(class=time #args=1,2,3) Formats integer nanoseconds since epoch as local timestamp. Consults $TZ environment variable unless third argument is supplied. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part Examples: nsec2localtime(1234567890000000000) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" nsec2localtime(1234567890123456789) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" nsec2localtime(1234567890123456789, 6) = "2009-02-14 01:31:30.123456" with TZ="Asia/Istanbul" nsec2localtime(1234567890123456789, 6, "Asia/Istanbul") = "2009-02-14 01:31:30.123456"
null_count¶
(class=stats #args=1) Returns the number of values in an array or map which are empty-string (AKA void) or JSON null. Returns error for non-array/non-map types. Values are stringified for comparison, so for example string "1" and integer 1 are not distinct. Example: null_count(["a", "", "c"]) is 1
os¶
(class=system #args=0) Returns the operating-system name as a string.
percentile¶
(class=stats #args=2,3) Returns the given percentile of values in an array or map. Returns empty string AKA void for empty array/map; returns error for non-array/non-map types. Please see the percentiles function for information on optional flags, and on performance for large inputs. Examples: percentile([3,4,5,6,9,10], 90) is 10 percentile([3,4,5,6,9,10], 90, {"interpolate_linearly":true}) is 9.5 percentile(["abc", "def", "ghi", "ghi"], 90) is "ghi"
percentiles¶
(class=stats #args=2,3) Returns the given percentiles of values in an array or map. Returns empty string AKA void for empty array/map; returns error for non-array/non-map types. See examples for information on the three option flags. Examples: Defaults are to not interpolate linearly, to produce a map keyed by percentile name, and to sort the input before computing percentiles:
percentiles([3,4,5,6,9,10], [25,75]) is { "25": 4, "75": 9 }
percentiles(["abc", "def", "ghi", "ghi"], [25,75]) is { "25": "def", "75": "ghi" } Use "output_array_not_map" (or shorthand "oa") to get the outputs as an array:
percentiles([3,4,5,6,9,10], [25,75], {"output_array_not_map":true}) is [4, 9] Use "interpolate_linearly" (or shorthand "il") to do linear interpolation -- note this produces error values on string inputs:
percentiles([3,4,5,6,9,10], [25,75], {"interpolate_linearly":true}) is { "25": 4.25, "75": 8.25 } The percentiles function always sorts its inputs before computing percentiles. If you know your input is already sorted -- see also the sort_collection function -- then computation will be faster on large input if you pass in "array_is_sorted" (shorthand: "ais"):
x = [6,5,9,10,4,3]
percentiles(x, [25,75], {"ais":true}) gives { "25": 5, "75": 4 } which is incorrect
x = sort_collection(x)
percentiles(x, [25,75], {"ais":true}) gives { "25": 4, "75": 9 } which is correct You can also leverage this feature to compute percentiles on a sort of your choosing. For example:
Non-sorted input:
x = splitax("the quick brown fox jumped loquaciously over the lazy dogs", " ")
x is: ["the", "quick", "brown", "fox", "jumped", "loquaciously", "over", "the", "lazy", "dogs"]
Percentiles are taken over the original positions of the words in the array -- "dogs" is last and hence appears as p99:
percentiles(x, [50, 99], {"oa":true, "ais":true}) gives ["loquaciously", "dogs"]
With sorting done inside percentiles, "the" is alphabetically last and is therefore the p99:
percentiles(x, [50, 99], {"oa":true}) gives ["loquaciously", "the"]
With default sorting done outside percentiles, the same:
x = sort(x) # or x = sort_collection(x)
x is: ["brown", "dogs", "fox", "jumped", "lazy", "loquaciously", "over", "quick", "the", "the"]
percentiles(x, [50, 99], {"oa":true, "ais":true}) gives ["loquaciously", "the"]
percentiles(x, [50, 99], {"oa":true}) gives ["loquaciously", "the"]
Now sorting by word length, "loquaciously" is longest and hence is the p99:
x = sort(x, func(a,b) { return strlen(a) <=> strlen(b) } )
x is: ["fox", "the", "the", "dogs", "lazy", "over", "brown", "quick", "jumped", "loquaciously"]
percentiles(x, [50, 99], {"oa":true, "ais":true})
["over", "loquaciously"]
pow¶
(class=arithmetic #args=2) Exponentiation. Same as **, but as a function.
qnorm¶
(class=math #args=1) Normal cumulative distribution function.
reduce¶
(class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, accumulates entries into a final output -- for example, sum or product. For arrays, the function should take two arguments, for accumulated value and array element, and return the accumulated element. For maps, it should take four arguments, for accumulated key and value, and map-element key and value; it should return the updated accumulator as a new key-value pair (i.e. a single-entry map). The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps. Examples: Array example: reduce([1,2,3,4,5], func(acc,e) {return acc + e**3}) returns 225. Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_of_squares": accv + ev**2}}) returns {"sum_of_squares": 35}.
regextract¶
(class=string #args=2) Extracts a substring (the first, if there are multiple matches), matching a regular expression, from the input. Does not use capture groups; see also the =~ operator which does. Examples: regextract("index ab09 file", "[a-z][a-z][0-9][0-9]") gives "ab09" regextract("index a999 file", "[a-z][a-z][0-9][0-9]") gives (absent), which will result in an assignment not happening.
regextract_or_else¶
(class=string #args=3) Like regextract but the third argument is the return value in case the input string (first argument) doesn't match the pattern (second argument). Examples: regextract_or_else("index ab09 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "ab09" regextract_or_else("index a999 file", "[a-z][a-z][0-9][0-9]", "nonesuch") gives "nonesuch"
rightpad¶
(class=string #args=3) Right-pads first argument to at most the specified length (second, integer argument) using specified pad value (third, string argument). If the first argument is not a string, it will be stringified first. Examples: rightpad("abcdefg", 10 , "*") gives "abcdefg***". rightpad("abcdefg", 10 , "XY") gives "abcdefgXY". rightpad("1234567", 10 , "0") gives "1234567000".
round¶
(class=math #args=1) Round to nearest integer.
roundm¶
(class=math #args=2) Round to nearest multiple of m: roundm($x,$m) is the same as round($x/$m)*$m.
rstrip¶
(class=string #args=1) Strip trailing whitespace from string.
sec2dhms¶
(class=time #args=1) Formats integer seconds as in sec2dhms(500000) = "5d18h53m20s"
sec2gmt¶
(class=time #args=1,2) Formats seconds since epoch as GMT timestamp. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part. Examples: sec2gmt(1234567890) = "2009-02-13T23:31:30Z" sec2gmt(1234567890.123456) = "2009-02-13T23:31:30Z" sec2gmt(1234567890.123456, 6) = "2009-02-13T23:31:30.123456Z"
sec2gmtdate¶
(class=time #args=1) Formats seconds since epoch (integer part) as GMT timestamp with year-month-date. Leaves non-numbers as-is. Example: sec2gmtdate(1440768801.7) = "2015-08-28".
sec2hms¶
(class=time #args=1) Formats integer seconds as in sec2hms(5000) = "01:23:20"
sec2localdate¶
(class=time #args=1,2) Formats seconds since epoch (integer part) as local timestamp with year-month-date. Leaves non-numbers as-is. Consults $TZ environment variable unless second argument is supplied. Examples: sec2localdate(1440768801.7) = "2015-08-28" with TZ="Asia/Istanbul" sec2localdate(1440768801.7, "Asia/Istanbul") = "2015-08-28"
sec2localtime¶
(class=time #args=1,2,3) Formats seconds since epoch (integer part) as local timestamp. Consults $TZ environment variable unless third argument is supplied. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part Examples: sec2localtime(1234567890) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456, 6) = "2009-02-14 01:31:30.123456" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456, 6, "Asia/Istanbul") = "2009-02-14 01:31:30.123456"
select¶
(class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, includes each input element in the output if the function returns true. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: select([1,2,3,4,5], func(e) {return e >= 3}) returns [3, 4, 5]. Map example: select({"a":1, "b":3, "c":5}, func(k,v) {return v >= 3}) returns {"b":3, "c": 5}.
sgn¶
(class=math #args=1) +1, 0, -1 for positive, zero, negative input respectively.
sha1¶
(class=hashing #args=1) SHA1 hash.
sha256¶
(class=hashing #args=1) SHA256 hash.
sha512¶
(class=hashing #args=1) SHA512 hash.
sin¶
(class=math #args=1) Trigonometric sine.
sinh¶
(class=math #args=1) Hyperbolic sine.
skewness¶
(class=stats #args=1) Returns the sample skewness of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: skewness([4,5,9,10,11]) is -0.2097285
sort¶
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. An additional "v" in that string means sort maps by value, rather than by key. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values. Examples: Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings. Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase. Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"]. Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"]. Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"]. Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1]. Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}. Map without function: sort({"c":2,"a":3,"b":1}) returns {"a":3,"b":1,"c":2}. Map without function: sort({"c":2,"a":3,"b":1}, "v") returns {"b":1,"c":2,"a":3}. Map without function: sort({"c":2,"a":3,"b":1}, "vnr") returns {"a":3,"c":2,"b":1}.
sort_collection¶
(class=stats #args=1) This is a helper function for the percentiles function; please see its online help for details.
splita¶
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on. Example: splita("3,4,5", ",") = [3,4,5]
splitax¶
(class=conversion #args=2) Splits string into array without type inference. First argument is string to split; second is the separator to split on. Example: splitax("3,4,5", ",") = ["3","4","5"]
splitkv¶
(class=conversion #args=3) Splits string by separators into map with type inference. First argument is string to split; second argument is pair separator; third argument is field separator. Example: splitkv("a=3,b=4,c=5", "=", ",") = {"a":3,"b":4,"c":5}
splitkvx¶
(class=conversion #args=3) Splits string by separators into map without type inference (keys and values are strings). First argument is string to split; second argument is pair separator; third argument is field separator. Example: splitkvx("a=3,b=4,c=5", "=", ",") = {"a":"3","b":"4","c":"5"}
splitnv¶
(class=conversion #args=2) Splits string by separator into integer-indexed map with type inference. First argument is string to split; second argument is separator to split on. Example: splitnv("a,b,c", ",") = {"1":"a","2":"b","3":"c"}
splitnvx¶
(class=conversion #args=2) Splits string by separator into integer-indexed map without type inference (values are strings). First argument is string to split; second argument is separator to split on. Example: splitnvx("3,4,5", ",") = {"1":"3","2":"4","3":"5"}
sqrt¶
(class=math #args=1) Square root.
ssub¶
(class=string #args=3) Like sub but does no regexing. No characters are special. Example: ssub("abc.def", ".", "X") gives "abcXdef"
stat¶
(class=system #args=1) Returns a map containing information about the provided path: "name" with string value, "size" as decimal int value, "mode" as octal int value, "modtime" as int-valued epoch seconds, and "isdir" as boolean value. Examples: stat("./mlr") gives {
"name": "mlr",
"size": 38391584,
"mode": 0755,
"modtime": 1715207874,
"isdir": false } stat("./mlr")["size"] gives 38391584
stddev¶
(class=stats #args=1) Returns the sample standard deviation of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: stddev([4,5,9,10,11]) is 3.1144823
strfntime¶
(class=time #args=2) Formats integer nanoseconds since the epoch as timestamp. Format strings are as at https://pkg.go.dev/github.com/lestrrat-go/strftime, with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also https://miller.readthedocs.io/en/latest/reference-dsl-time/ for more information on the differences from the C library ("man strftime" on your system). See also strftime_local. Examples: strfntime(1440768801123456789,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strfntime(1440768801123456789,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.123Z" strfntime(1440768801123456789,"%Y-%m-%dT%H:%M:%6SZ") = "2015-08-28T13:33:21.123456Z"
strfntime_local¶
(class=time #args=2,3) Like strfntime but consults the $TZ environment variable to get local time zone. Examples: strfntime_local(1440768801123456789, "%Y-%m-%d %H:%M:%S %z") = "2015-08-28 16:33:21 +0300" with TZ="Asia/Istanbul" strfntime_local(1440768801123456789, "%Y-%m-%d %H:%M:%3S %z") = "2015-08-28 16:33:21.123 +0300" with TZ="Asia/Istanbul" strfntime_local(1440768801123456789, "%Y-%m-%d %H:%M:%3S %z", "Asia/Istanbul") = "2015-08-28 16:33:21.123 +0300" strfntime_local(1440768801123456789, "%Y-%m-%d %H:%M:%9S %z", "Asia/Istanbul") = "2015-08-28 16:33:21.123456789 +0300"
strftime¶
(class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as at https://pkg.go.dev/github.com/lestrrat-go/strftime, with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also https://miller.readthedocs.io/en/latest/reference-dsl-time/ for more information on the differences from the C library ("man strftime" on your system). See also strftime_local. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z"
strftime_local¶
(class=time #args=2,3) Like strftime but consults the $TZ environment variable to get local time zone. Examples: strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%S %z") = "2015-08-28 16:33:21 +0300" with TZ="Asia/Istanbul" strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%3S %z") = "2015-08-28 16:33:21.700 +0300" with TZ="Asia/Istanbul" strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%3S %z", "Asia/Istanbul") = "2015-08-28 16:33:21.700 +0300"
string¶
(class=conversion #args=1) Convert int/float/bool/string/array/map to string.
strip¶
(class=string #args=1) Strip leading and trailing whitespace from string.
strlen¶
(class=string #args=1) String length.
strmatch¶
(class=string #args=2) Boolean yes/no for whether the stringable first argument matches the regular-expression second argument. No regex captures are provided; please see `strmatch`. Examples: strmatch("a", "abc") is false strmatch("abc", "a") is true strmatch("abc", "a[a-z]c") is true strmatch("abc", "(a).(c)") is true strmatch(12345, "34") is true
strmatchx¶
(class=string #args=2) Extended information for whether the stringable first argument matches the regular-expression second argument. Regex captures are provided in the return-value map; \1, \2, etc. are not set, in contrast to the `=~` operator. As well, while the `=~` operator limits matches to \1 through \9, an arbitrary number are supported here. Examples: strmatchx("a", "abc") returns:
{
"matched": false
} strmatchx("abc", "a") returns:
{
"matched": true,
"full_capture": "a",
"full_start": 1,
"full_end": 1
} strmatchx("[zy:3458]", "([a-z]+):([0-9]+)") returns:
{
"matched": true,
"full_capture": "zy:3458",
"full_start": 2,
"full_end": 8,
"captures": ["zy", "3458"],
"starts": [2, 5],
"ends": [3, 8]
}
strpntime¶
(class=time #args=2) strpntime: Parses timestamp as integer nanoseconds since the epoch. See also strpntime_local. Examples: strpntime("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801000000000 strpntime("2015-08-28T13:33:21.345Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801345000000 strpntime("1970-01-01 00:00:00 -0400", "%Y-%m-%d %H:%M:%S %z") = 14400000000000 strpntime("1970-01-01 00:00:00 +0200", "%Y-%m-%d %H:%M:%S %z") = -7200000000000
strpntime_local¶
(class=time #args=2,3) Like strpntime but consults the $TZ environment variable to get local time zone. Examples: strpntime_local("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440758001000000000 with TZ="Asia/Istanbul" strpntime_local("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440758001345000000 with TZ="Asia/Istanbul" strpntime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S") = 1440758001000000000 with TZ="Asia/Istanbul" strpntime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001000000000
strptime¶
(class=time #args=2) strptime: Parses timestamp as floating-point seconds since the epoch. See also strptime_local. Examples: strptime("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000 strptime("2015-08-28T13:33:21.345Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000 strptime("1970-01-01 00:00:00 -0400", "%Y-%m-%d %H:%M:%S %z") = 14400 strptime("1970-01-01 00:00:00 +0200", "%Y-%m-%d %H:%M:%S %z") = -7200
strptime_local¶
(class=time #args=2,3) Like strptime but consults the $TZ environment variable to get local time zone. Examples: strptime_local("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440758001 with TZ="Asia/Istanbul" strptime_local("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440758001.345 with TZ="Asia/Istanbul" strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S") = 1440758001 with TZ="Asia/Istanbul" strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001
sub¶
(class=string #args=3) '$name = sub($name, "old", "new")': replace once (first match, if there are multiple matches), with support for regular expressions. Capture groups \1 through \9 in the new part are matched from (...) in the old part, and must be used within the same call to sub -- they don't persist for subsequent DSL statements. See also =~ and regextract. See also "Regular expressions" at https://miller.readthedocs.io. Examples: sub("ababab", "ab", "XY") gives "XYabab" sub("abc.def", ".", "X") gives "Xbc.def" sub("abc.def", "\.", "X") gives "abcXdef" sub("abcdefg", "[ce]", "X") gives "abXdefg" sub("prefix4529:suffix8567", "suffix([0-9]+)", "name\1") gives "prefix4529:name8567"
substr¶
(class=string #args=3) substr is an alias for substr0. See also substr1. Miller is generally 1-up with all array and string indices, but, this is a backward-compatibility issue with Miller 5 and below. Arrays are new in Miller 6; the substr function is older.
substr0¶
(class=string #args=3) substr0(s,m,n) gives substring of s from 0-up position m to n inclusive. Negative indices -len .. -1 alias to 0 .. len-1. See also substr and substr1.
substr1¶
(class=string #args=3) substr1(s,m,n) gives substring of s from 1-up position m to n inclusive. Negative indices -len .. -1 alias to 1 .. len. See also substr and substr0.
sum¶
(class=stats #args=1) Returns the sum of values in an array or map. Returns error for non-array/non-map types. Example: sum([1,2,3,4,5]) is 15
sum2¶
(class=stats #args=1) Returns the sum of squares of values in an array or map. Returns error for non-array/non-map types. Example: sum2([1,2,3,4,5]) is 55
sum3¶
(class=stats #args=1) Returns the sum of cubes of values in an array or map. Returns error for non-array/non-map types. Example: sum3([1,2,3,4,5]) is 225
sum4¶
(class=stats #args=1) Returns the sum of fourth powers of values in an array or map. Returns error for non-array/non-map types. Example: sum4([1,2,3,4,5]) is 979
sysntime¶
(class=time #args=0) Returns the system time in 64-bit nanoseconds since the epoch.
system¶
(class=system #args=1) Run command string, yielding its stdout minus final carriage return.
systime¶
(class=time #args=0) Returns the system time in floating-point seconds since the epoch.
systimeint¶
(class=time #args=0) Returns the system time in integer seconds since the epoch.
tan¶
(class=math #args=1) Trigonometric tangent.
tanh¶
(class=math #args=1) Hyperbolic tangent.
tolower¶
(class=string #args=1) Convert string to lowercase.
toupper¶
(class=string #args=1) Convert string to uppercase.
truncate¶
(class=string #args=2) Truncates string first argument to max length of int second argument.
typeof¶
(class=typing #args=1) Convert argument to type of argument (e.g. "str"). For debug.
unflatten¶
(class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. The first argument is a map, and the second argument is the flatten separator. See also arrayify. See "Flatten/unflatten: converting between JSON and tabular formats" at https://miller.readthedocs.io for more information. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}.
unformat¶
(class=string #args=2) Using first argument as format string, unpacks second argument into an array of matches, with type-inference. On non-match, returns error -- use is_error() to check. Examples: unformat("{}:{}:{}", "1:2:3") gives [1, 2, 3]. unformat("{}h{}m{}s", "3h47m22s") gives [3, 47, 22]. is_error(unformat("{}h{}m{}s", "3:47:22")) gives true.
unformatx¶
(class=string #args=2) Same as unformat, but without type-inference. Examples: unformatx("{}:{}:{}", "1:2:3") gives ["1", "2", "3"]. unformatx("{}h{}m{}s", "3h47m22s") gives ["3", "47", "22"]. is_error(unformatx("{}h{}m{}s", "3:47:22")) gives true.
upntime¶
(class=time #args=0) Returns the time in 64-bit nanoseconds since the current Miller program was started.
uptime¶
(class=time #args=0) Returns the time in floating-point seconds since the current Miller program was started.
urand¶
(class=math #args=0) Floating-point numbers uniformly distributed on the unit interval. Example: Int-valued example: '$n=floor(20+urand()*11)'.
urand32¶
(class=math #args=0) Integer uniformly distributed 0 and 2**32-1 inclusive.
urandelement¶
(class=math #args=1) Random sample from the first argument, which must be an non-empty array.
urandint¶
(class=math #args=2) Integer uniformly distributed between inclusive integer endpoints.
urandrange¶
(class=math #args=2) Floating-point numbers uniformly distributed on the interval [a, b).
utf8_to_latin1¶
(class=string #args=1) Tries to convert UTF-8-encoded string to Latin-1-encoded string. If argument is array or map, recurses into it. Examples: $y = utf8_to_latin1($x) $* = utf8_to_latin1($*)
variance¶
(class=stats #args=1) Returns the sample variance of values in an array or map. Returns empty string AKA void for array/map of length less than two; returns error for non-array/non-map types. Example: variance([4,5,9,10,11]) is 9.7
version¶
(class=system #args=0) Returns the Miller version as a string.
!¶
(class=boolean #args=1) Logical negation.
!=¶
(class=boolean #args=2) String/numeric inequality. Mixing number and string results in string compare.
!=~¶
(class=boolean #args=2) String (left-hand side) does not match regex (right-hand side), e.g. '$name !=~ "^a.*b$"'.
%¶
(class=arithmetic #args=2) Remainder; never negative-valued (pythonic).
&¶
(class=arithmetic #args=2) Bitwise AND.
&&¶
(class=boolean #args=2) Logical AND.
*¶
(class=arithmetic #args=2) Multiplication, with integer*integer overflow to float.
**¶
(class=arithmetic #args=2) Exponentiation. Same as pow, but as an infix operator.
+¶
(class=arithmetic #args=1,2) Addition as binary operator; unary plus operator.
-¶
(class=arithmetic #args=1,2) Subtraction as binary operator; unary negation operator.
.¶
(class=string #args=2) String concatenation. Non-strings are coerced, so you can do '"ax".98' etc.
.*¶
(class=arithmetic #args=2) Multiplication, with integer-to-integer overflow.
.+¶
(class=arithmetic #args=2) Addition, with integer-to-integer overflow.
.-¶
(class=arithmetic #args=2) Subtraction, with integer-to-integer overflow.
./¶
(class=arithmetic #args=2) Integer division, rounding toward zero.
/¶
(class=arithmetic #args=2) Division. Integer / integer is integer when exact, else floating-point: e.g. 6/3 is 2 but 6/4 is 1.5.
//¶
(class=arithmetic #args=2) Pythonic integer division, rounding toward negative.
<¶
(class=boolean #args=2) String/numeric less-than. Mixing number and string results in string compare.
<<¶
(class=arithmetic #args=2) Bitwise left-shift.
<=¶
(class=boolean #args=2) String/numeric less-than-or-equals. Mixing number and string results in string compare.
<=>¶
(class=boolean #args=2) Comparator, nominally for sorting. Given a <=> b, returns <0, 0, >0 as a < b, a == b, or a > b, respectively.
==¶
(class=boolean #args=2) String/numeric equality. Mixing number and string results in string compare.
=~¶
(class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. Capture groups \1 through \9 are matched from (...) in the right-hand side, and can be used within subsequent DSL statements. See also "Regular expressions" at https://miller.readthedocs.io. Examples: With if-statement: if ($url =~ "http.*com") { ... } Without if-statement: given $line = "index ab09 file", and $line =~ "([a-z][a-z])([0-9][0-9])", then $label = "[\1:\2]", $label is "[ab:09]"
>¶
(class=boolean #args=2) String/numeric greater-than. Mixing number and string results in string compare.
>=¶
(class=boolean #args=2) String/numeric greater-than-or-equals. Mixing number and string results in string compare.
>>¶
(class=arithmetic #args=2) Bitwise signed right-shift.
>>>¶
(class=arithmetic #args=2) Bitwise unsigned right-shift.
?:¶
(class=boolean #args=3) Standard ternary operator.
??¶
(class=boolean #args=2) Absent-coalesce operator. $a ?? 1 evaluates to 1 if $a isn't defined in the current record.
???¶
(class=boolean #args=2) Absent/empty-coalesce operator. $a ??? 1 evaluates to 1 if $a isn't defined in the current record, or has empty value.
^¶
(class=arithmetic #args=2) Bitwise XOR.
^^¶
(class=boolean #args=2) Logical XOR.
|¶
(class=arithmetic #args=2) Bitwise OR.
||¶
(class=boolean #args=2) Logical OR.
~¶
(class=arithmetic #args=1) Bitwise NOT. Beware '$y=~$x' since =~ is the regex-match operator: try '$y = ~$x'.
KEYWORDS FOR PUT AND FILTER¶
all¶
all: used in "emit1", "emit", "emitp", and "unset" as a synonym for @*
begin¶
begin: defines a block of statements to be executed before input records are ingested. The body statements must be wrapped in curly braces.
Example: 'begin { @count = 0 }'
bool¶
bool: declares a boolean local variable in the current curly-braced scope. Type-checking happens at assignment: 'bool b = 1' is an error.
break¶
break: causes execution to continue after the body of the current for/while/do-while loop.
call¶
call: used for invoking a user-defined subroutine.
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
continue¶
continue: causes execution to skip the remaining statements in the body of the current for/while/do-while loop. For-loop increments are still applied.
do¶
do: with "while", introduces a do-while loop. The body statements must be wrapped in curly braces.
dump¶
dump: prints all currently defined out-of-stream variables immediately to stdout as JSON. With >, >>, or |, the data do not go directly to stdout but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line.
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
edump¶
edump: prints all currently defined out-of-stream variables immediately to stderr as JSON.
Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
elif¶
elif: the way Miller spells "else if". The body statements must be wrapped in curly braces.
else¶
else: terminates an if/elif/elif chain. The body statements must be wrapped in curly braces.
emit1¶
emit1: inserts an out-of-stream variable into the output record stream. Unlike the other map variants, side-by-sides, indexing, and redirection are not supported, but you can emit any map-valued expression.
Example: mlr --from f.dat put 'emit1 $*'
Example: mlr --from f.dat put 'emit1 mapsum({"id": NR}, $*)' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
emit¶
emit: inserts an out-of-stream variable into the output record stream. Hashmap indices present in the data but not slotted by emit arguments are not output. With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h.
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
emitf¶
emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the output record stream. With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h.
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
emitp¶
emitp: inserts an out-of-stream variable into the output record stream. Hashmap indices present in the data but not slotted by emitp arguments are output concatenated with ":". With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h.
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
end¶
end: defines a block of statements to be executed after input records are ingested. The body statements must be wrapped in curly braces.
Example: 'end { emit @count }'
Example: 'end { eprint "Final count is " . @count }'
eprint¶
eprint: prints expression immediately to stderr.
Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
eprintn¶
eprintn: prints expression immediately to stderr, without trailing newline.
Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
false¶
false: the boolean literal value.
filter¶
filter: includes/excludes the record in the output record stream.
Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)' Instead of put with 'filter false' you can simply use put -q. The following uses the input record to accumulate data but only prints the running sum without printing the input record:
Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
float¶
float: declares a floating-point local variable in the current curly-braced scope. Type-checking happens at assignment: 'float x = 0' is an error.
for¶
for: defines a for-loop using one of three styles. The body statements must be wrapped in curly braces. For-loop over stream record:
Example: 'for (k, v in $*) { ... }' For-loop over out-of-stream variables:
Example: 'for (k, v in @counts) { ... }'
Example: 'for ((k1, k2), v in @counts) { ... }'
Example: 'for ((k1, k2, k3), v in @*) { ... }' C-style for-loop:
Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
func¶
func: used for defining a user-defined function.
Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
funct¶
funct: used for saying that a function argument is a user-defined function.
Example: 'func g(num a, num b, funct f) :num { return f(a**2+b**2) }'
if¶
if: starts an if/elif/elif chain. The body statements must be wrapped in curly braces.
in¶
in: used in for-loops over stream records or out-of-stream variables.
int¶
int: declares an integer local variable in the current curly-braced scope. Type-checking happens at assignment: 'int x = 0.0' is an error.
map¶
map: declares a map-valued local variable in the current curly-braced scope. Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is always OK. map b = a is OK or not depending on whether a is a map.
num¶
num: declares an int/float local variable in the current curly-braced scope. Type-checking happens at assignment: 'num b = true' is an error.
print¶
print: prints expression immediately to stdout.
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
printn¶
printn: prints expression immediately to stdout, without trailing newline.
Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
return¶
return: specifies the return value from a user-defined function. Omitted return statements (including via if-branches) result in an absent-null return value, which in turns results in a skipped assignment to an LHS.
stderr¶
stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename to print to standard error.
stdout¶
stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename to print to standard output.
str¶
str: declares a string local variable in the current curly-braced scope. Type-checking happens at assignment.
subr¶
subr: used for defining a subroutine.
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
tee¶
tee: prints the current record to specified file. This is an immediate print to the specified file (except for pprint format which of course waits until the end of the input stream to format all output). The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output. See also mlr -h. emit with redirect and tee with redirect are identical, except tee can only output $*.
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
Example: mlr --from f.dat put 'tee > stderr, $*'
Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\]", $*'
Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\] > /tmp/data-".$a, $*'
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
true¶
true: the boolean literal value.
unset¶
unset: clears field(s) from the current record, or an out-of-stream or local variable.
Example: mlr --from f.dat put 'unset $x'
Example: mlr --from f.dat put 'unset $*'
Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
Example: mlr --from f.dat put '...; unset @sums'
Example: mlr --from f.dat put '...; unset @sums["green"]'
Example: mlr --from f.dat put '...; unset @*'
var¶
var: declares an untyped local variable in the current curly-braced scope.
Examples: 'var a=1', 'var xyz=""'
while¶
while: introduces a while loop, or with "do", introduces a do-while loop. The body statements must be wrapped in curly braces.
ENV¶
ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
FILENAME¶
FILENAME: evaluates to the name of the current file being processed.
FILENUM¶
FILENUM: evaluates to the number of the current file being processed, starting with 1.
FNR¶
FNR: evaluates to the number of the current record within the current file being processed, starting with 1. Resets at the start of each file.
IFS¶
IFS: evaluates to the input field separator from the command line.
IPS¶
IPS: evaluates to the input pair separator from the command line.
IRS¶
IRS: evaluates to the input record separator from the command line, or to LF or CRLF from the input data if in autodetect mode (which is the default).
M_E¶
M_E: the mathematical constant e.
M_PI¶
M_PI: the mathematical constant pi.
NF¶
NF: evaluates to the number of fields in the current record.
NR¶
NR: evaluates to the number of the current record over all files being processed, starting with 1. Does not reset at the start of each file.
OFS¶
OFS: evaluates to the output field separator from the command line.
OPS¶
OPS: evaluates to the output pair separator from the command line.
ORS¶
ORS: evaluates to the output record separator from the command line, or to LF or CRLF from the input data if in autodetect mode (which is the default).
AUTHOR¶
Miller is written by John Kerl <kerl.john.r@gmail.com>.
This manual page has been composed from Miller's help output by Eric MSP Veith <eveith@veith-m.de>.
SEE ALSO¶
awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, the Miller docsite https://miller.readthedocs.io
2024-10-05 |