Java(TM) Compiler Compiler(TM)

Error Reporting and Recovery


This is a rough document describing the new error recovery features in
Version 0.7pre2 (and later versions).  For a concise list of changes please see the file
javacc.RELEASENOTES that comes with the software release.

The first change (from 0.6) is that we have two new exceptions:

    . ParseException
    . TokenMgrError

Whenever the token manager detects a problem, it throws the exception
TokenMgrError.  Previously, it used to print the message:

  Lexical Error ...

following which it use to throw the exception ParseError.

Whenever the parser detects a problem, it throws the exception
ParseException.  Previously, it used to print the message:

  Encountered ... Was expecting one of ...

following which it use to throw the exception ParseError.

In the new version, error messages are never printed explicitly,
rather this information is stored inside the exception objects that
are thrown.  Please see the classes ParseException.java and
TokenMgrError.java (that get generated by JavaCC during parser
generation) for more details.

If the thrown exceptions are never caught, then a standard action is
taken by the virtual machine which normally includes printing the
stack trace and also the result of the "toString" method in the
exception.  So if you do not catch the JavaCC exceptions, a message
quite similar to the ones you are used to will be displayed.

But if you catch the exception, you must print the message yourself.

(NON UPWARD COMPATIBILITY ALERT: Note above paragraph)

Some Java Background

Exceptions in Java are all subclasses of type Throwable.  Furthermore,
exceptions are divided into two broad categories - ERRORS and other
exceptions.

Errors are exceptions that one is not expected to recover from -
examples of these are ThreadDeath or OutOfMemoryError.  Errors are
indicated by subclassing the exception "Error".  Exceptions subclassed
from Error need not be specified in the "throws" clause of method
declarations.

Exceptions other than errors are typically defined by subclassing the
exception "Exception".  These exceptions are typically handled by the
user program and must be declared in throws clauses of method
declarations (if it is possible for the method to throw that
exception).

The exception TokenMgrError is a subclass of Error, while the
exception ParseException is a subclass of Exception.  For a short
while (to maintain some semblance of upward compatibility), we retain
the exception ParseError.  ParseError is a subclass of Exception (as
it has been) and ParseException is a subclass of ParseError.  In the
next couple of months, ParseError will go away.

The reasoning here is that the token manager is never expected to
throw an exception - you must be careful in defining your token
specifications such that you cover all cases.  Hence the suffix
"Error" in TokenMgrError.  You do not have to worry about this
exception - if you have designed your tokens well, it should never get
thrown.  Whereas it is typical to attempt recovery from Parser errors
- hence the name "ParseException".

Let's move on now.

In 0.7pre2 we reverted to generating only "throws ParseException" for
all methods corresponding to non-terminals.  In 0.7pre1 we had "throws
Exception", which was too broad.  This is analogous to "throws
ParseError" in verison 0.6.1.

(NON UPWARD COMPATIBILITY ALERT: Note above paragraph)

In 0.7pre2, we have added a syntax to specify additional exceptions
that may be thrown by methods corresponding to non-terminals.  This
syntax is identical to the Java "throws ..." syntax.  Here's an
example of how you use this:

  void VariableDeclaration() throws SymbolTableException, IOException :
  {...}
  {
    ...
  }

Here, VariableDeclaration is defined to throw exceptions
SymbolTableException and IOException in addition to ParseException.

Error Reporting:

The documentation in the error reporting mini-tutorial is
obsolete for this new version.  Please pretend you never read that
document.  The method token_error no longer exists, for example.  The
scheme for error reporting is simpler now - simply modify the file
ParseException.java to do what you want it to do.  Typically, you
would modify the getMessage method to do your own customized error
reporting.  All information regarding these methods can be obtained
from the comments in the generated files ParseException.java and
TokenMgrError.java.  It will also help to understand the functionality
of the class Throwable.

(NON UPWARD COMPATIBILITY ALERT: Note above paragraph)

There is a new method in the parser called "generateParseException".
You can call this method anytime you wish to generate an object of
type ParseException.  This object will contain all the choices that
the parser has attempted since the last successfully consumed token.

Error Recovery:

JavaCC offers two kinds of error recovery - shallow recovery and deep
recovery.  Shallow recovery recovers if none of the current choices
have succeeded in being selected, while deep recovery is when a choice
is selected, but then an error happens sometime during the parsing of
this choice.

Shallow Error Recovery:

We shall explain shallow error recovery using the following example:

void Stm() :
{}
{
  IfStm()
|
  WhileStm()
}

Let's assume that IfStm starts with the reserved word "if" and WhileStm
starts with the reserved word "while".  Suppose you want to recover by
skipping all the way to the next semicolon when neither IfStm nor WhileStm
can be matched by the next input token (assuming a lookahead of 1).  That
is the next token is neither "if" nor "while".

What you do is write the following:

void Stm() :
{}
{
  IfStm()
|
  WhileStm()
|
  error_skipto(SEMICOLON)
}

But you have to define "error_skipto" first.  So far as JavaCC is concerned,
"error_skipto" is just like any other non-terminal.  The following is one
way to define "error_skipto" (here we use the standard JAVACODE production):

JAVACODE
void error_skipto(int kind) {
  ParseException e = generateParseException();  // generate the exception object.
  System.out.println(e.toString());  // print the error message
  Token t;
  do {
    t = getNextToken();
  } while (t.kind != kind);
    // The above loop consumes tokens all the way upto a token of
    // "kind".  We use a do-while loop rather than a while because the
    // current token is the one immediately before the erroneous token
    // (in our case the token immediately before what should have been
    // "if"/"while".
}

That's it for shallow error recovery.  In a future version of JavaCC
we will have support for modular composition of grammars.  When this
happens, one can place all these error recovery routines into a
separate module that can be "imported" into the main grammar module.
We intend to supply a library of useful routines (for error recovery
and otherwise) when we implement this capability.

Deep Error Recovery:

Let's use the same example that we did for shallow recovery:

void Stm() :
{}
{
  IfStm()
|
  WhileStm()
}

In this case we wish to recover in the same way.  However, we wish to
recover even when there is an error deeper into the parse.  For
example, suppose the next token was "while" - therefore the choice
"WhileStm" was taken.  But suppose that during the parse of WhileStm
some error is encoutered - say one has "while (foo { stm; }" - i.e., the
closing parentheses has been missed.  Shallow recovery will not work
for this situation.  You need deep recovery to achieve this.  For this,
we offer a new syntactic entity in JavaCC - the try-catch-finally block.

First, let us rewrite the above example for deep error recovery and then
explain the try-catch-finally block in more detail:

void Stm() :
{}
{
  try {
    (
      IfStm()
    |
      WhileStm()
    )
  catch (ParseException e) {
    error_skipto(SEMICOLON);
  }
}

That's all you need to do.  If there is any unrecovered error during the
parse of IfStm or WhileStm, then the catch block takes over.  You can
have any number of catch blocks and also optionally a finally block
(just as in Java).  What goes into the catch blocks is *Java code*,
not JavaCC expansions.  For example, the above example could have been
rewritten as:

void Stm() :
{}
{
  try {
    (
      IfStm()
    |
      WhileStm()
    )
  catch (ParseException e) {
    System.out.println(e.toString());
    Token t;
    do {
      t = getNextToken();
    } while (t.kind != SEMICOLON);
  }
}

Our belief is that its best to avoid placing too much Java code in the
catch and finally blocks since it overwhelms the grammar reader.  Its best
to define methods that you can then call from the catch blocks.

Note that in the second writing of the example, we essentially copied
the code out of the implementation of error_skipto.  But we left out the
first statement - the call to generateParseException.  That's because in
this case, the catch block already provides us with the exception.  But
even if you did call this method, you will get back an identical object.


JavaCC Home | SunTest Home | Download | Testimonials | Documentation | FAQ | Support | Contact Us