Scroll to navigation

PP(3) Library Functions Manual PP(3)

NAME

pp - ANSI C preprocessor library

SYNOPSIS

:PACKAGE: ast
#include <pp.h>
%include "pptokens.yacc
-lpp

DESCRIPTION

The pp library provides a tokenizing implementation of the C language preprocessor and supports K&R (Reiser), ANSI and C++ dialects. The preprocessor is comprised of 12 public functions, a global character class table accessed by macros, and a single global struct with 10 public elements.

pp operates in two modes. Standalone mode is used to implement the traditional standalone C preprocessor. Tokeinizing mode provides a function interface to a stream of preprocessed tokens. pp is by default ANSI; the only default predefined symbols are __STDC__ and __STDPP__. Dialects (K&R, C++) and local conventions are determined by compiler specific probe(1) information that is included at runtime. The probe information can be overridden by providing a file pp_default.h with pragmas and definitions for each compiler implementation. This file is usually located in the compiler specific default include directory.

Directive, command line argument, option and pragma syntax is described in cpp(1). pp specific semantics are described below. Most semantic differences with standard or classic implementations are in the form of optimizations.

Options and pragmas map to ppop function calls described below. For the remaining descriptions, ``setting ppop(PP_operation)'' is a shorthand for calling ppop with the arguments appropriate for PP_operation.

The library interface describes only the public functions and struct elements. Static structs and pointers to structs are provided by the library. The user should not attempt to allocate structs. In particular, sizeof is meaningless for pp supplied structs.

The global struct pp provides readonly information. Any changes to pp must be done using the functions described below. pp has the following public elements:

The pp implementaion version string.
The current line sync directive name. Used for standalone line sync output. The default value is the empty string. See the ppline function below.
The current output file name.
The pragma pass name for pp. The default value is pp.
The string representation for the current input token.
The inclusive or of:
Set if ppop(PP_COMMENT) was set.
Set if ppop(PP_COMPATIBILITY) was set.
Set if standalone line syncs require a file argument.
Set if standalone line syncs require a third argument. The third argument is 1 for include file push, 2 for include file pop and null otherwise.
Set if ppop(PP_STRICT) was set.
Set if ppop(PP_TRANSITION) was set.
The list of directories to be searched for "..." include files. If the first directory name is "" then it is replaced by the directory of the including file at include time. The public elements of struct ppdirs are:
The directory pathname.
The next directory, 0 if it is the last in the list.
pp.stddirs->next is the list of directories to be searched for <...> include files. This list may be 0.
If ppop(PP_COMPILE) was set then pp.symbol points to the symbol table entry for the current identifier token. pp.symbol is undefined for non-identifier tokens. Once defined, an identifier will always have the same ppsymbol pointer. If ppop(PP_NOHASH) was also set then pp.symbol is defined for macro and keyword tokens and 0 for all other identifiers. The elements of struct ppsymbol are:
The identifier name.
The inclusive or of the following flags:
Currently being expanded.
Builtin macro.
Macro expansion currently disabled.
Function-like macro.
Initialization macro.
Keyword identifier.
Loaded checkpoint macro.
#macdef macro.
No identifiers in macro body.
Predefined macro.
Also a #assert predicate.
Readonly macro.
Ok to redefine.
Variadic function-like macro.
First unused symbol flag bit index. The bits from (1<<SYM_UNUSED) on are initially unset and may be set by the user.
Non-zero if the identifier is a macro. int macro->arity is the number of formal arguments for function-like macros and char* macro->value is the macro definition value, a 0 terminated string that may contain internal mark sequences.
Initially set to 0 and never modified by pp. This field may be set by the user.
The macro and identifier struct ppsymbol hash table. The hash(3) routines may be used to examine the table, with the exception that the following macros must be used for individual pp.symtab symbol lookup:
Return the ppsymbol pointer for name, 0 if name not defined.
Return the ppsymbol pointer for name. If name is not defined then allocate and return a new ppsymbol for it.

Error messages are reported using error(3) and the following globals relate to pp:

The level 2 error count. Error levels above 2 cause immediate exit. If error_info.errors is non-zero then the user program exit status should also be non-zero.
The current input file name.
The current input line number.
The debug trace level, 0 by default. Larger negative numbers produce more trace information. Enabled when the user program is linked with the -g cc(1) option.
The level 1 error count. Warnings do not affect the exit status.

The functions are:

Passed to optjoin(3) to parse cpp(1) style options and arguments. The user may also supply application specific option parsers. Also handles non-standard options like the sun -undef and GNU -trigraphs. Hello in there, ever here of getopt(3)?
This is the standalone cpp(1) entry point. ppcpp consumes all of the input and writes the preprocessed text to the output. A single call to ppcpp is equivalent to, but more efficient than:

ppop(PP_SPACEOUT, 1);
while (pplex()) ppprintf(" %s", pp.token);
The default comment handler that passes comments to the output. May be used as an argument to ppop(PP_COMMENT), or the user may supply an application specific handler. head is the comment head text, /* for C and // for C++, comment is the comment body, tail is the comment tail text, */ for C and newline for C++, and line is the comment starting line number.
Equivalent to error(3). All pp error and warning messages pass through pperror. The user may link with an application specific pperror to override the library default.
The default include reference handler that outputs file to the standard error. May be used as an argument to the ppop(PP_INCREF), or the user may supply an application specific handler. parent is the including file name, file is the current include file name, line is the current line number in file, and push is non-zero if file is being pushed or 0 if file is being popped.
Pushes the 0 terminated buffer on the pp input stack. file is the pseudo file name used in line syncs for buffer and line is the starting line number.
Returns the token type of the next input token. pp.token and where applicable pp.symbol are updated to refer to the new token. The token type constants are defined in pp.h for #include and pp.yacc for yacc(1) %include. The token constant names match T_[A-Z_]*; some are encoded by oring with N_[A-Z_]* tokens.

The numeric constant tokens and encodings are:


T_DOUBLE (N_NUMBER|N_REAL)
T_DOUBLE_L (N_NUMBER|N_REAL|N_LONG)
T_FLOAT (N_NUMBER|N_REAL|N_FLOAT)
T_DECIMAL (N_NUMBER)
T_DECIMAL_L (N_NUMBER|N_LONG)
T_DECIMAL_U (N_NUMBER|N_UNSIGNED)
T_DECIMAL_UL (N_NUMBER|N_UNSIGNED|N_LONG)
T_OCTAL (N_NUMBER|N_OCTAL)
T_OCTAL_L (N_NUMBER|N_OCTAL|N_LONG)
T_OCTAL_U (N_NUMBER|N_OCTAL|N_UNSIGNED)
T_OCTAL_UL (N_NUMBER|N_OCTAL|N_UNSIGNED|N_LONG)
T_HEXADECIMAL (N_NUMBER|N_HEXADECIMAL)
T_HEXADECIMAL_L (N_NUMBER|N_HEXADECIMAL|N_LONG)
T_HEXADECIMAL_U (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED)
T_HEXADECIMAL_UL (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED|N_LONG)
The normal C tokens are:

T_ID C identifier
T_INVALID invalid token
T_HEADER <..>
T_CHARCONST '..'
T_WCHARCONST L'..'
T_STRING ".."
T_WSTRING L".."
T_PTRMEM ->
T_ADDADD ++
T_SUBSUB --
T_LSHIFT <<
T_RSHIFT >>
T_LE <=
T_GE >=
T_EQ ==
T_NE !=
T_ANDAND &&
T_OROR ||
T_MPYEQ *=
T_DIVEQ /=
T_MODEQ %=
T_ADDEQ +=
T_SUBEQ -=
T_LSHIFTEQ <<=
T_RSHIFTEQ >>=
T_ANDEQ &=
T_XOREQ ^=
T_OREQ |=
T_TOKCAT ##
T_VARIADIC ...
T_DOTREF .* [if PP_PLUSPLUS]
T_PTRMEMREF ->* [if PP_PLUSPLUS]
T_SCOPE :: [if PP_PLUSPLUS]
T_UMINUS unary minus
If ppop(PP_COMPILE) was set then the keyword tokens are also defined. Compiler differences and dialects are detected by the pp probe(1) information, and only the appropriate keywords are enabled. The ANSI keyword tokens are:
T_AUTO          T_BREAK          T_CASE           T_CHAR
T_CONTINUE      T_DEFAULT        T_DO             T_DOUBLE_T
T_ELSE          T_EXTERN         T_FLOAT_T        T_FOR
T_GOTO          T_IF             T_INT            T_LONG
T_REGISTER      T_RETURN         T_SHORT          T_SIZEOF
T_STATIC        T_STRUCT         T_SWITCH         T_TYPEDEF
T_UNION         T_UNSIGNED       T_WHILE          T_CONST
T_ENUM          T_SIGNED         T_VOID           T_VOLATILE
    
and the C++ keyword tokens are:
T_CATCH         T_CLASS          T_DELETE         T_FRIEND
T_INLINE        T_NEW            T_OPERATOR       T_OVERLOAD
T_PRIVATE       T_PROTECTED      T_PUBLIC         T_TEMPLATE
T_THIS          T_THROW          T_TRY            T_VIRTUAL
    
In addition, T_ASM is recognized where appropriate. Additional keyword tokens >= T_KEYWORD may be added using ppop(PP_COMPILE).

Many C implementations show no restraint in adding new keywords; some PC compilers have tripled the number of keywords. For the most part these new keywords introduce noise constructs that can be ignored for standard (reasonable) analysis and compilation. The noise keywords fall in four syntactic categories that map into the two noise keyword tokens T_NOISE and T_NOISES. For T_NOISES pp.token points to the entire noise construct, including the offending noise keyword. The basic noise keyword categories are:

The simplest noise: a single keyword that is noise in any context and maps to T_NOISE.
A noise keyword that precedes an optional grouping construct, either (..) or {..} and maps to T_NOISES.
A noise keyword that consumes the remaining tokens in the line and maps to T_NOISES.
A noise keyword that consumes the tokens up to the next ; and maps to T_NOISES.

If ppop(PP_NOISE) is > 0 then implementation specific noise constructs are mapped to either T_NOISE or T_NOISES , otherwise if ppop(PP_NOISE) is < 0 then noise constructs are completely ignored, otherwise the unmapped grouping noise tokens T_X_.* are returned.

Token encodings may be tested by the following macros:

Non-zero if token is an integral or floating point numeric constant.
Non-zero if token is an integral numeric constant.
Non-zero if token is a floating point numeric constant.
Non-zero if token is a C assignment operator.
Non-zero if token must be separated from other tokens by space.
Non-zero if token is a noise keyword.
The default line sync handler that outputs line sync pragmas for the C compiler front end. May be used as an argument to ppop(PP_LINE), or the user may supply an application specific handler. line is the line number and file is the file name. If ppop(PP_LINEID) was set then the directive # lineid line "file" is output.
The default macro reference handler that outputs a macro reference pragmas. May be used as an argument to ppop(PP_MACREF), or the user may supply an application specific handler. symbol is the macro ppsymbol pointer, file is the reference file, line is the reference line, and if type is non-zero a macro value checksum is also output. The pragma syntax is #pragma pp:macref "symbol->name" line checksum.
ppop is the option control interface. op determines the type(s) of the remaining argument(s). Options marked by /*INIT*/ must be done before PP_INIT.
(PP_ASSERT, char* string) /*INIT*/
string is asserted as if by #assert.
(PP_BUILTIN, char*(*fun)(char* buf, char* name, char* args)) /*INIT*/
Installs fun as the unknown builtin macro handler. Builtin macros are of the form #(name args). fun is called with name set to the unknown builtin macro name and args set to the arguments. buf is a MAXTOKEN+1 buffer that can be used for the fun return value. 0 should be returned on error.
(PP_COMMENT,void (*fun)(char*head,char*body,char*tail,int line) /*INIT*/
(PP_COMPATIBILITY, char* string) /*INIT*/
(PP_COMPILE, char* string) /*INIT*/
(PP_DEBUG, char* string) /*INIT*/
(PP_DEFAULT, char* string) /*INIT*/
(PP_DEFINE, char* string) /*INIT*/
string is defined as if by #define.
(PP_DIRECTIVE, char* string) /*INIT*/
The directive #string is executed.
(PP_DONE, char* string) /*INIT*/
(PP_DUMP, char* string) /*INIT*/
(PP_FILEDEPS, char* string) /*INIT*/
(PP_FILENAME, char* string) /*INIT*/
(PP_HOSTDIR, char* string) /*INIT*/
(PP_HOSTED, char* string) /*INIT*/
(PP_ID, char* string) /*INIT*/
(PP_IGNORE, char* string) /*INIT*/
(PP_INCLUDE, char* string) /*INIT*/
(PP_INCREF, char* string) /*INIT*/
(PP_INIT, char* string) /*INIT*/
(PP_INPUT, char* string) /*INIT*/
(PP_LINE, char* string) /*INIT*/
(PP_LINEFILE, char* string) /*INIT*/
(PP_LINEID, char* string) /*INIT*/
(PP_LINETYPE, char* string) /*INIT*/
(PP_LOCAL, char* string) /*INIT*/
(PP_MACREF, char* string) /*INIT*/
(PP_MULTIPLE, char* string) /*INIT*/
(PP_NOHASH, char* string) /*INIT*/
(PP_NOID, char* string) /*INIT*/
(PP_NOISE, char* string) /*INIT*/
(PP_OPTION, char* string) /*INIT*/
The directive #pragma pp:string is executed.
(PP_OPTARG, char* string) /*INIT*/
(PP_OUTPUT, char* string) /*INIT*/
(PP_PASSNEWLINE, char* string) /*INIT*/
(PP_PASSTHROUGH, char* string) /*INIT*/
(PP_PLUSPLUS, char* string) /*INIT*/
(PP_PRAGMA, char* string) /*INIT*/
(PP_PREFIX, char* string) /*INIT*/
(PP_PROBE, char* string) /*INIT*/
(PP_READ, char* string) /*INIT*/
(PP_RESERVED, char* string) /*INIT*/
(PP_SPACEOUT, char* string) /*INIT*/
(PP_STANDALONE, char* string) /*INIT*/
(PP_STANDARD, char* string) /*INIT*/
(PP_STRICT, char* string) /*INIT*/
(PP_TEST, char* string) /*INIT*/
(PP_TRUNCATE, char* string) /*INIT*/
(PP_UNDEF, char* string) /*INIT*/
(PP_WARN, char* string) /*INIT*/
The default handler that copies unknown directives and pragmas to the output. May be used as an argument to ppop(PP_PRAGMA), or the user may supply an application specific handler. This function is most often called after directive and pragma mapping. Any of the arguments may be 0. dir is the directive name, pass is the pragma pass name, name is the pragma option name, value is the pragma option value, and nl is non-zero if a trailing newline is required if the pragma is copied to the output.
A printf(3) interface to the standalone pp output buffer. Macros provide limited control over output buffering: void ppflushout() flushes the output buffer, void ppcheckout() flushes the output buffer if over PPBUFSIZ character are buffered, int pppendout() returns the number of pending character in the output buffer, and void ppputchar(int c) places the character c in the output buffer.

CAVEATS

The ANSI mode is intended to be true to the standard. The compatibility mode has been proven in practice, but there are surely dark corners of some implementations that may have been omitted.

SEE ALSO

cc(1), cpp(1), nmake(1), probe(1), yacc(1),
ast(3), error(3), hash(3), optjoin(3)

AUTHOR

Glenn Fowler
(Dennis Ritchie provided the original table driven lexer.)
AT&T Bell Laboratories