Scroll to navigation

MAGIC(3) Library Functions Manual MAGIC(3)

NAME

magic - magic file interface

SYNOPSIS

#include <magic.h>
Magic_t
{
	unsigned long	flags;
};
Magic_t*  magicopen(unsigned long flags);
void      magicclose(Magic_t* magic);
int       magicload(Magic_t* magic, const char* path, unsigned long flags);
int       magiclist(Magic_t* magic, Sfio_t* sp);
char*     magictype(Magic_t* magic, const char* path, struct stat* st);

DESCRIPTION

These routines provide an interface to the file(1) command magic file. magicopen returns a magic session handle that is passed to all of the other routines. flags may be

MAGIC_MIME
Return the MIME type string rather than the magic file description.
MAGIC_PHYSICAL
Don't follow symbolic links.
MAGIC_STAT
The stat structure st passed to magictype will contain valid stat (2) information. See magictype below.
MAGIC_VERBOSE
Enable verbose error messages.

magicclose closes the magic session.

magicload loads the magic file named by path into the magic session. flags are the same as with magicopen. More than one magic file can be loaded into a session; the files are searched in load order. If path is 0 then the default magic file is loaded.

magiclist lists the magic file contents on the sfio(3) stream sp. This is used for debugging magic entries.

magictype returns the type string for path with optional stat(2) information st. If st == 0 then magictype calls stat on a private stat buffer, else if magicopen was called with the MAGIC_STAT flag then st is assumed to contain valid stat information, otherwise magictype calls stat on st. magictype always returns a non-null string. If errors are encounterd on path then the return value will contain information on those errors, e.g., cannot stat.

FORMAT

The magic file format is a backwards compatible extension of an ancient System V file implementation. However, with the extended format it is possible to write a single magic file that works on all platforms. Most of the net magic files floating around work with magic, but they usually double up on le and be entries that are automatically handled by magic.

A magic file entry describes a procedure for determining a single file type based on the file pathname, stat (2) information, and the file data. An entry is a sequence of lines, each line being a record of space separated fields. The general record format is:

[op]offset type [mask]expression description [mimetype]
# in the first column introduces a comment. The first record in an entry contains no op; the remaining records for an entry contain an op. Integer constants are as in C: 0x* or 0X* for hexadecimal, 0* for octal and decimal otherwise.

The op field may be one of:

+
The previous records must match but the current record is optional. > is an old-style synonym for +.
&
The previous and current records must match.
{
Starts a nesting block that is terminated by }. A nesting block pushes a new context for the + and & ops. The { and } records have no other fields.
A function declaration and call for the single character identifier id. The function return is a nesting block end record }. Function may be redefined. Functions have no arguments or return value.
A call to the function id.

The offset field is either the offset into the data upon which the current entry operates or a file metadata identifier. Offsets are either integer constants or offset expressions. An offset expression is contained in (...) and is a combination of integral arithmetic operators and the @ indirection operator. Indirections take the form @integer where integer is the data offset for the indirection value. The size of the indirection value is taken either from one of the suffixes B(byte,1char), H(short,2chars), L(long,4chars), pr Q(quead,8chars), or from the type field. Valid file metadata identifiers are:

atime
The string representation of stat.st_atime.
blocks
stat.st_blocks.
ctime
The string representation of stat.st_ctime.
fstype
The string representation of stat.st_fstype.
gid
The string representation of stat.st_gid.
stat.st_mode file mode bits in modecanon(3) canonical representation (i.e., the good old octal values).
mtime
The string representation of stat.st_mtime.
nlink
stat.st_nlink.
size
stat.st_size.
name
The file path name sans directory.
uid
The string representation of stat.st_uid.

The type field specifies the type of the data at offset. Integral types may be prefixed by le or be for specifying exact little-endian or big-endian representation, but the internal algorithm automatically loops through the standard representations to find integral matches, so representation prefixes are rarely used. However, this looping may cause some magic entry conflicts; use the le or be prefix in these cases. Only one representation is used for all the records in an entry. Valid types are:

byte
A 1 byte integer.
short
A 2 byte integer.
long
A 4 byte integer.
quad
An 8 byte integer. Tests on this type may fail is the local compiler does not support an 8 byte integral type and the corresponding value overflows 4 bytes.
date
The data at offset is interpreted as a 4 byte seconds-since-the-epoch date and converted to a string.
edit
The expression field is an ed(1) style substitution expression del old del new del [ flags ] where the substituted value is made available to the description field %s format. In addition to the flags supported by ed(3) are l that converts the substituted value to lower case and u that converts the substituted value to upper case. If old does not match the string data at offset then the entry record fails.
match
expression field is a strmatch(3) pattern that is matched against the string data at offset.
string
The expression field is a string that is compared with the string data at offset.

The optional mask field takes the form &number where number is anded with the integral value at offset before the expression is applied.

The contents of the expression field depends on the type. String type expression are described in the type field entries above. * means any value and applies to all types. Integral type expression take the form [operator] operandP where operand is compared with the data value at offset using operator. operator may be one of <. <=, ==, >= or >. operator defaults to == if omitted. operand may be an integral constant or one of the following builtin function calls:

magic()
A recursive call to the magic algorithm starting with the data at offset.
Call function starting at offset and increment offset by increment after each iteration. Iteration continues until the description text does not change.

The description field is the most important because it is this field that is presented to the outside world. When constructing description fields one must be very careful to follow the style layed out in the magic file, lest yet another layer of inconsistency creep into the system. The description for each matching record in an entry are concatenated to form the complete magic type. If the previous matching description in the current entry does not end with space and the current description is not empty and does not start with comma , dot or backspace then a space is placed between the descriptions (most optional descriptions start with comma.) The data value at offset can be referenced in the description using %s for the string types and %ld or %lu for the integral types.

The mimetype field specifies the MIME type, usually in the form a/b.

FILES

../lib/file/magic located on $PATH

EXAMPLES

0	long		0x020c0108	hp s200 executable, pure
o{
+36	long		>0		, not stripped
+4	short		>0		, version %ld
}
0	long		0x020c0107	hp s200 executable
o()
0	long		0x020c010b	hp s200 executable, demand-load
o()

The function o(), shared by 3 entries, determines if the executable is stripped and also extracts the version number.

0	long		0407		bsd 386 executable
&mode	long		&0111!=0
+16	long		>0		, not stripped

This entry requires that the file also has execute permission.

SEE ALSO

file(1), mime(4), tw(1), modecanon(3)