PP(3) | Library Functions Manual | PP(3) |
NAME¶
pp - ANSI C preprocessor library
SYNOPSIS¶
:PACKAGE: ast #include <pp.h> %include "pptokens.yacc -lpp
DESCRIPTION¶
The pp library provides a tokenizing implementation of the C language preprocessor and supports K&R (Reiser), ANSI and C++ dialects. The preprocessor is comprised of 12 public functions, a global character class table accessed by macros, and a single global struct with 10 public elements.
pp operates in two modes. Standalone mode is used to implement the traditional standalone C preprocessor. Tokeinizing mode provides a function interface to a stream of preprocessed tokens. pp is by default ANSI; the only default predefined symbols are __STDC__ and __STDPP__. Dialects (K&R, C++) and local conventions are determined by compiler specific probe(1) information that is included at runtime. The probe information can be overridden by providing a file pp_default.h with pragmas and definitions for each compiler implementation. This file is usually located in the compiler specific default include directory.
Directive, command line argument, option and pragma syntax is described in cpp(1). pp specific semantics are described below. Most semantic differences with standard or classic implementations are in the form of optimizations.
Options and pragmas map to ppop function calls described below. For the remaining descriptions, ``setting ppop(PP_operation)'' is a shorthand for calling ppop with the arguments appropriate for PP_operation.
The library interface describes only the public functions and struct elements. Static structs and pointers to structs are provided by the library. The user should not attempt to allocate structs. In particular, sizeof is meaningless for pp supplied structs.
The global struct pp provides readonly information. Any changes to pp must be done using the functions described below. pp has the following public elements:
- char* version
- The pp implementaion version string.
- char* lineid
- The current line sync directive name. Used for standalone line sync output. The default value is the empty string. See the ppline function below.
- char* outfile
- The current output file name.
- char* pass
- The pragma pass name for pp. The default value is pp.
- char* token
- The string representation for the current input token.
- int flags
- The inclusive or of:
- PP_comment
- Set if ppop(PP_COMMENT) was set.
- PP_compatibility
- Set if ppop(PP_COMPATIBILITY) was set.
- PP_linefile
- Set if standalone line syncs require a file argument.
- PP_linetype
- Set if standalone line syncs require a third argument. The third argument is 1 for include file push, 2 for include file pop and null otherwise.
- PP_strict
- Set if ppop(PP_STRICT) was set.
- PP_transition
- Set if ppop(PP_TRANSITION) was set.
- struct ppdirs* lcldirs
- The list of directories to be searched for "..." include files. If the first directory name is "" then it is replaced by the directory of the including file at include time. The public elements of struct ppdirs are:
- char* name
- The directory pathname.
- struct ppdirs* next
- The next directory, 0 if it is the last in the list.
- struct ppdirs* stddirs
- pp.stddirs->next is the list of directories to be searched for <...> include files. This list may be 0.
- struct ppsymbol* symbol
- If ppop(PP_COMPILE) was set then pp.symbol points to the symbol table entry for the current identifier token. pp.symbol is undefined for non-identifier tokens. Once defined, an identifier will always have the same ppsymbol pointer. If ppop(PP_NOHASH) was also set then pp.symbol is defined for macro and keyword tokens and 0 for all other identifiers. The elements of struct ppsymbol are:
- char* name
- The identifier name.
- int flags
- The inclusive or of the following flags:
- SYM_ACTIVE
- Currently being expanded.
- SYM_BUILTIN
- Builtin macro.
- SYM_DISABLED
- Macro expansion currently disabled.
- SYM_FUNCTION
- Function-like macro.
- SYM_INIT
- Initialization macro.
- SYM_KEYWORD
- Keyword identifier.
- SYM_LOADED
- Loaded checkpoint macro.
- SYM_MULTILINE
- #macdef macro.
- SYM_NOEXPAND
- No identifiers in macro body.
- SYM_PREDEFINED
- Predefined macro.
- SYM_PREDICATE
- Also a #assert predicate.
- SYM_READONLY
- Readonly macro.
- SYM_REDEFINE
- Ok to redefine.
- SYM_VARIADIC
- Variadic function-like macro.
- SYM_UNUSED
- First unused symbol flag bit index. The bits from (1<<SYM_UNUSED) on are initially unset and may be set by the user.
- struct ppmacro* macro
- Non-zero if the identifier is a macro. int macro->arity is the number of formal arguments for function-like macros and char* macro->value is the macro definition value, a 0 terminated string that may contain internal mark sequences.
- char* value
- Initially set to 0 and never modified by pp. This field may be set by the user.
- Hash_table_t* symtab
- The macro and identifier struct ppsymbol hash table. The hash(3) routines may be used to examine the table, with the exception that the following macros must be used for individual pp.symtab symbol lookup:
- struct ppsymbol* ppsymget(Hash_table_t* table, char* name)
- Return the ppsymbol pointer for name, 0 if name not defined.
- struct ppsymbol* ppsymset(Hash_table_t* table, char* name)
- Return the ppsymbol pointer for name. If name is not defined then allocate and return a new ppsymbol for it.
Error messages are reported using error(3) and the following globals relate to pp:
- int error_info.errors
- The level 2 error count. Error levels above 2 cause immediate exit. If error_info.errors is non-zero then the user program exit status should also be non-zero.
- char* error_info.file
- The current input file name.
- int error_info.line
- The current input line number.
- int error_info.trace
- The debug trace level, 0 by default. Larger negative numbers produce more trace information. Enabled when the user program is linked with the -g cc(1) option.
- int error_info.warnings
- The level 1 error count. Warnings do not affect the exit status.
The functions are:
- extern int ppargs(char** argv, int last);
- Passed to optjoin(3) to parse cpp(1) style options and arguments. The user may also supply application specific option parsers. Also handles non-standard options like the sun -undef and GNU -trigraphs. Hello in there, ever here of getopt(3)?
- extern void ppcpp(void);
- This is the standalone cpp(1) entry point. ppcpp consumes all of
the input and writes the preprocessed text to the output. A single call to
ppcpp is equivalent to, but more efficient than:
ppop(PP_SPACEOUT, 1);
while (pplex()) ppprintf(" %s", pp.token); - extern int ppcomment(char* head, char* comment, char* tail, int line);
- The default comment handler that passes comments to the output. May be used as an argument to ppop(PP_COMMENT), or the user may supply an application specific handler. head is the comment head text, /* for C and // for C++, comment is the comment body, tail is the comment tail text, */ for C and newline for C++, and line is the comment starting line number.
- extern void pperror(int level, char* format, ...);
- Equivalent to error(3). All pp error and warning messages pass through pperror. The user may link with an application specific pperror to override the library default.
- extern int ppincref(char* parent, char* file, int line, int push);
- The default include reference handler that outputs file to the standard error. May be used as an argument to the ppop(PP_INCREF), or the user may supply an application specific handler. parent is the including file name, file is the current include file name, line is the current line number in file, and push is non-zero if file is being pushed or 0 if file is being popped.
- extern void ppinput(char* buffer, char* file, int line);
- Pushes the 0 terminated buffer on the pp input stack. file is the pseudo file name used in line syncs for buffer and line is the starting line number.
- int pplex(void)
- Returns the token type of the next input token. pp.token and where
applicable pp.symbol are updated to refer to the new token. The token type
constants are defined in pp.h for #include and pp.yacc for yacc(1)
%include. The token constant names match T_[A-Z_]*; some are encoded by
oring with N_[A-Z_]* tokens.
The numeric constant tokens and encodings are:
The normal C tokens are:
T_DOUBLE (N_NUMBER|N_REAL)
T_DOUBLE_L (N_NUMBER|N_REAL|N_LONG)
T_FLOAT (N_NUMBER|N_REAL|N_FLOAT)
T_DECIMAL (N_NUMBER)
T_DECIMAL_L (N_NUMBER|N_LONG)
T_DECIMAL_U (N_NUMBER|N_UNSIGNED)
T_DECIMAL_UL (N_NUMBER|N_UNSIGNED|N_LONG)
T_OCTAL (N_NUMBER|N_OCTAL)
T_OCTAL_L (N_NUMBER|N_OCTAL|N_LONG)
T_OCTAL_U (N_NUMBER|N_OCTAL|N_UNSIGNED)
T_OCTAL_UL (N_NUMBER|N_OCTAL|N_UNSIGNED|N_LONG)
T_HEXADECIMAL (N_NUMBER|N_HEXADECIMAL)
T_HEXADECIMAL_L (N_NUMBER|N_HEXADECIMAL|N_LONG)
T_HEXADECIMAL_U (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED)
T_HEXADECIMAL_UL (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED|N_LONG)
If ppop(PP_COMPILE) was set then the keyword tokens are also defined. Compiler differences and dialects are detected by the pp probe(1) information, and only the appropriate keywords are enabled. The ANSI keyword tokens are:
T_ID C identifier
T_INVALID invalid token
T_HEADER <..>
T_CHARCONST '..'
T_WCHARCONST L'..'
T_STRING ".."
T_WSTRING L".."
T_PTRMEM ->
T_ADDADD ++
T_SUBSUB --
T_LSHIFT <<
T_RSHIFT >>
T_LE <=
T_GE >=
T_EQ ==
T_NE !=
T_ANDAND &&
T_OROR ||
T_MPYEQ *=
T_DIVEQ /=
T_MODEQ %=
T_ADDEQ +=
T_SUBEQ -=
T_LSHIFTEQ <<=
T_RSHIFTEQ >>=
T_ANDEQ &=
T_XOREQ ^=
T_OREQ |=
T_TOKCAT ##
T_VARIADIC ...
T_DOTREF .* [if PP_PLUSPLUS]
T_PTRMEMREF ->* [if PP_PLUSPLUS]
T_SCOPE :: [if PP_PLUSPLUS]
T_UMINUS unary minusT_AUTO T_BREAK T_CASE T_CHAR T_CONTINUE T_DEFAULT T_DO T_DOUBLE_T T_ELSE T_EXTERN T_FLOAT_T T_FOR T_GOTO T_IF T_INT T_LONG T_REGISTER T_RETURN T_SHORT T_SIZEOF T_STATIC T_STRUCT T_SWITCH T_TYPEDEF T_UNION T_UNSIGNED T_WHILE T_CONST T_ENUM T_SIGNED T_VOID T_VOLATILE
and the C++ keyword tokens are:T_CATCH T_CLASS T_DELETE T_FRIEND T_INLINE T_NEW T_OPERATOR T_OVERLOAD T_PRIVATE T_PROTECTED T_PUBLIC T_TEMPLATE T_THIS T_THROW T_TRY T_VIRTUAL
In addition, T_ASM is recognized where appropriate. Additional keyword tokens >= T_KEYWORD may be added using ppop(PP_COMPILE).Many C implementations show no restraint in adding new keywords; some PC compilers have tripled the number of keywords. For the most part these new keywords introduce noise constructs that can be ignored for standard (reasonable) analysis and compilation. The noise keywords fall in four syntactic categories that map into the two noise keyword tokens T_NOISE and T_NOISES. For T_NOISES pp.token points to the entire noise construct, including the offending noise keyword. The basic noise keyword categories are:
- T_NOISE
- The simplest noise: a single keyword that is noise in any context and maps to T_NOISE.
- T_X_GROUP
- A noise keyword that precedes an optional grouping construct, either (..) or {..} and maps to T_NOISES.
- T_X_LINE
- A noise keyword that consumes the remaining tokens in the line and maps to T_NOISES.
- T_X_STATEMENT
- A noise keyword that consumes the tokens up to the next ; and maps to T_NOISES.
If ppop(PP_NOISE) is > 0 then implementation specific noise constructs are mapped to either T_NOISE or T_NOISES , otherwise if ppop(PP_NOISE) is < 0 then noise constructs are completely ignored, otherwise the unmapped grouping noise tokens T_X_.* are returned.
Token encodings may be tested by the following macros:
- int isnumber(int token);
- Non-zero if token is an integral or floating point numeric constant.
- int isinteger(int token);
- Non-zero if token is an integral numeric constant.
- int isreal(int token);
- Non-zero if token is a floating point numeric constant.
- int isassignop(int token);
- Non-zero if token is a C assignment operator.
- int isseparate(int token);
- Non-zero if token must be separated from other tokens by space.
- int isnoise(int token);
- Non-zero if token is a noise keyword.
- extern int ppline(int line, char* file);
- The default line sync handler that outputs line sync pragmas for the C compiler front end. May be used as an argument to ppop(PP_LINE), or the user may supply an application specific handler. line is the line number and file is the file name. If ppop(PP_LINEID) was set then the directive # lineid line "file" is output.
- extern int ppmacref(struct ppsymbol* symbol, char* file, int line, int type);
- The default macro reference handler that outputs a macro reference pragmas. May be used as an argument to ppop(PP_MACREF), or the user may supply an application specific handler. symbol is the macro ppsymbol pointer, file is the reference file, line is the reference line, and if type is non-zero a macro value checksum is also output. The pragma syntax is #pragma pp:macref "symbol->name" line checksum.
- int ppop(int op, ...)
- ppop is the option control interface. op determines the type(s) of the remaining argument(s). Options marked by /*INIT*/ must be done before PP_INIT.
- (PP_ASSERT, char* string) /*INIT*/
- string is asserted as if by #assert.
- (PP_BUILTIN, char*(*fun)(char* buf, char* name, char* args)) /*INIT*/
- Installs fun as the unknown builtin macro handler. Builtin macros are of the form #(name args). fun is called with name set to the unknown builtin macro name and args set to the arguments. buf is a MAXTOKEN+1 buffer that can be used for the fun return value. 0 should be returned on error.
- (PP_COMMENT,void (*fun)(char*head,char*body,char*tail,int line) /*INIT*/
- (PP_COMPATIBILITY, char* string) /*INIT*/
- (PP_COMPILE, char* string) /*INIT*/
- (PP_DEBUG, char* string) /*INIT*/
- (PP_DEFAULT, char* string) /*INIT*/
- (PP_DEFINE, char* string) /*INIT*/
- string is defined as if by #define.
- (PP_DIRECTIVE, char* string) /*INIT*/
- The directive #string is executed.
- (PP_DONE, char* string) /*INIT*/
- (PP_DUMP, char* string) /*INIT*/
- (PP_FILEDEPS, char* string) /*INIT*/
- (PP_FILENAME, char* string) /*INIT*/
- (PP_HOSTDIR, char* string) /*INIT*/
- (PP_HOSTED, char* string) /*INIT*/
- (PP_ID, char* string) /*INIT*/
- (PP_IGNORE, char* string) /*INIT*/
- (PP_INCLUDE, char* string) /*INIT*/
- (PP_INCREF, char* string) /*INIT*/
- (PP_INIT, char* string) /*INIT*/
- (PP_INPUT, char* string) /*INIT*/
- (PP_LINE, char* string) /*INIT*/
- (PP_LINEFILE, char* string) /*INIT*/
- (PP_LINEID, char* string) /*INIT*/
- (PP_LINETYPE, char* string) /*INIT*/
- (PP_LOCAL, char* string) /*INIT*/
- (PP_MACREF, char* string) /*INIT*/
- (PP_MULTIPLE, char* string) /*INIT*/
- (PP_NOHASH, char* string) /*INIT*/
- (PP_NOID, char* string) /*INIT*/
- (PP_NOISE, char* string) /*INIT*/
- (PP_OPTION, char* string) /*INIT*/
- The directive #pragma pp:string is executed.
- (PP_OPTARG, char* string) /*INIT*/
- (PP_OUTPUT, char* string) /*INIT*/
- (PP_PASSNEWLINE, char* string) /*INIT*/
- (PP_PASSTHROUGH, char* string) /*INIT*/
- (PP_PLUSPLUS, char* string) /*INIT*/
- (PP_PRAGMA, char* string) /*INIT*/
- (PP_PREFIX, char* string) /*INIT*/
- (PP_PROBE, char* string) /*INIT*/
- (PP_READ, char* string) /*INIT*/
- (PP_RESERVED, char* string) /*INIT*/
- (PP_SPACEOUT, char* string) /*INIT*/
- (PP_STANDALONE, char* string) /*INIT*/
- (PP_STANDARD, char* string) /*INIT*/
- (PP_STRICT, char* string) /*INIT*/
- (PP_TEST, char* string) /*INIT*/
- (PP_TRUNCATE, char* string) /*INIT*/
- (PP_UNDEF, char* string) /*INIT*/
- (PP_WARN, char* string) /*INIT*/
- int pppragma(char* dir, char* pass, char* name, char* value, int nl);
- The default handler that copies unknown directives and pragmas to the output. May be used as an argument to ppop(PP_PRAGMA), or the user may supply an application specific handler. This function is most often called after directive and pragma mapping. Any of the arguments may be 0. dir is the directive name, pass is the pragma pass name, name is the pragma option name, value is the pragma option value, and nl is non-zero if a trailing newline is required if the pragma is copied to the output.
- int ppprintf(char* format, ...);
- A printf(3) interface to the standalone pp output buffer. Macros provide limited control over output buffering: void ppflushout() flushes the output buffer, void ppcheckout() flushes the output buffer if over PPBUFSIZ character are buffered, int pppendout() returns the number of pending character in the output buffer, and void ppputchar(int c) places the character c in the output buffer.
CAVEATS¶
The ANSI mode is intended to be true to the standard. The compatibility mode has been proven in practice, but there are surely dark corners of some implementations that may have been omitted.
SEE ALSO¶
cc(1), cpp(1), nmake(1), probe(1), yacc(1),
ast(3), error(3), hash(3), optjoin(3)
AUTHOR¶
Glenn Fowler
(Dennis Ritchie provided the original table driven lexer.)
AT&T Bell Laboratories