Lexer

A lexer converts an input stream of characters into a sequence of tokens, including the location and nature of the token.

Tokens can then be used by a parser to build a program tree. If there is an error (e.g. an unexpected/invalid sequence of tokens), the location information can be used to formulate a useful error message.

Example

We can load up any valid PL/0 program and inspect the sequence of tokens produced.

Here is the multiply program:

VAR x, y, z;

BEGIN
	x := 10;
	y := 20;
	
	z := x * y;
	
	! z
END.

Here is the token stream generated by the program:

$ ./pl0_lexer.py < examples/multiply.pl1 
LexToken(VAR,'VAR',2,1)
LexToken(NAME,'x',2,5)
LexToken(COMMA,',',2,6)
LexToken(NAME,'y',2,8)
LexToken(COMMA,',',2,9)
LexToken(NAME,'z',2,11)
LexToken(EOS,';',2,12)
LexToken(BEGIN,'BEGIN',4,15)
LexToken(NAME,'x',5,22)
LexToken(UPDATE,':=',5,24)
LexToken(NUMBER,10,5,27)
LexToken(EOS,';',5,29)
LexToken(NAME,'y',6,32)
LexToken(UPDATE,':=',6,34)
LexToken(NUMBER,20,6,37)
LexToken(EOS,';',6,39)
LexToken(NAME,'z',8,44)
LexToken(UPDATE,':=',8,46)
LexToken(NAME,'x',8,49)
LexToken(TIMES,'*',8,51)
LexToken(NAME,'y',8,53)
LexToken(EOS,';',8,54)
LexToken(PRINT,'!',10,59)
LexToken(NAME,'z',10,61)
LexToken(END,'END',11,63)
LexToken(DOT,'.',11,66)