up | Inhaltsverzeichniss | Kommentar

Manual page for LEX(1)

lex - lexical analysis program generator

SYNOPSIS

lex [ -fntv ] [ filename ] ...

DESCRIPTION

lex generates programs to be used in simple lexical analysis of text. Each filename (the standard input by default) contains regular expressions to search for, and actions written in C to be executed when expressions are found.

A C source program, lex.yy.c is generated, to be compiled as follows:

cc lex.yy.c -ll

This program, when run, copies unrecognized portions of the input to the output, and executes the associated C action for each regular expression that is recognized. The actual string matched is left in yytext, an external character array.

Matching is done in order of the strings in the file. The strings may contain square braces to indicate character classes, as in [abx-z] to indicate a, b, x, y, and z; and the operators *, + and ?, which mean, respectively, any nonnegative number, any positive number, or either zero or one occurrences of the previous character or character-class. The ``dot'' character (`.') is the class of all ASCII characters except NEWLINE.

Parentheses for grouping and vertical bar for alternation are also supported. The notation r{d,e} in a rule indicates instances of regular expression r between d and e. It has a higher precedence than |, but lower than that of *, ?, +, or concatenation. The ^ (carat character) at the beginning of an expression permits a successful match only immediately after a NEWLINE, and the $ character at the end of an expression requires a trailing NEWLINE.

The / character in an expression indicates trailing context; only the part of the expression up to the slash is returned in yytext, although the remainder of the expression must follow in the input stream.

An operator character may be used as an ordinary symbol if it is within `"' symbols or preceded by `\'.

Three subroutines defined as macros are expected: input() to read a character; unput(c) to replace a character read; and output(c) to place an output character. They are defined in terms of the standard streams, but you can override them. The program generated is named yylex(), and the library contains a main() which calls it. The action REJECT on the right side of the rule rejects this match and executes the next suitable match; the function yymore() accumulates additional characters into the same yytext; and the function yyless(n) where n is the number of characters to retain in yytext. The macros input and output use files yyin and yyout to read from and write to, defaulted to stdin and stdout, respectively.

In a lex program, any line beginning with a blank is assumed to contain only C text and is copied; if it precedes %% it is copied into the external definition area of the lex.yy.c file. All rules should follow a %%, as in YACC. Lines preceding %% which begin with a nonblank character define the string on the left to be the remainder of the line; it can be used later by surrounding it with {}. Note: curly brackets do not imply parentheses; only string substitution is done.

The external names generated by lex all begin with the prefix yy or YY.

Certain table sizes for the resulting finite-state machine can be set in the definitions section:

%p n
number of positions is n (default 2000)
%n n
number of states is n (500)
%t n
number of parse tree nodes is n (1000)
%a n
number of transitions is n (3000)

The use of one or more of the above automatically implies the -v option, unless the -n option is used.

OPTIONS

-f
Faster compilation. Do not bother to pack the resulting tables; limited to small programs.
-n
Opposite of -v; -n is default.
-t
Place the result on the standard output instead of in file lex.yy.c.
-v
Print a one-line summary of statistics of the generated analyzer.

EXAMPLES

The following command line:

lex lexcommands

would draw lex instructions from the file lexcommands, and place the output in lex.yy.c.

The following:

%% [A-Z] putchar (yytext[0]+'a'-'A'); [ ]+$ ; [ ]+ putchar(' ');

is an example of a lex program. It converts upper case to lower, removes blanks at the end of lines, and replaces multiple blanks by single blanks.

D	[0-9]
%%
if	printf("IF statement\n");
[a-z]+	printf("tag, value %s\n",yytext);
0{D}+	printf("octal number %s\n",yytext);
{D}+	printf("decimal number %s\n",yytext);
"++"	printf("unary op\n");
"+"	printf("binary op\n");
"/*"	{	loop:
		while (input() != '*');
		switch (input())
			{
			case '/': break;
			case '*': unput('*');
			default: go to loop;
			}
		}

FILES

lex.yy.c

SEE ALSO

sed.1v yacc.1

[a manual with the abbreviation PUL]

NOTES

The lex command is not changed to support 8-bit symbol names, as this would produce lex source code that is not portable between systems.


index | Inhaltsverzeichniss | Kommentar

Created by unroff & hp-tools. © somebody (See intro for details). All Rights Reserved. Last modified 11/5/97