com.google.caja.lexer
Class JsLexer

java.lang.Object
  extended by com.google.caja.lexer.JsLexer
All Implemented Interfaces:
TokenStream<JsTokenType>

public class JsLexer
extends java.lang.Object
implements TokenStream<JsTokenType>

Tokenizes javascript source.

Author:
mikesamuel@gmail.com (Mike Samuel)

Nested Class Summary
(package private) static class JsLexer.WordClassifier
           
 
Field Summary
private static java.util.regex.Pattern INTEGER_LITERAL_RE
           
private static PunctuationTrie<?> JAVASCRIPT_PUNCTUATOR
           
private static java.util.regex.Pattern TOKEN_BEFORE_REGEXP_LITERAL_RE
          According to http://www.mozilla.org/js/language/js20/rationale/syntax.html "To support error recovery, JavaScript 2.0's lexical grammar must be made independent of its syntactic grammar.
private  TokenStream<JsTokenType> ts
           
 
Constructor Summary
JsLexer(CharProducer producer)
           
JsLexer(CharProducer producer, boolean isQuasiliteral)
           
 
Method Summary
static PunctuationTrie<?> getPunctuationTrie()
           
 boolean hasNext()
          True if TokenStream.next() is safe to call.
static boolean isJsLineSeparator(char ch)
           
static boolean isJsSpace(char ch)
           
(package private) static boolean isRegexp(java.lang.String previous)
           
 Token<JsTokenType> next()
          Returns the next value, and moves the stream position forward.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ts

private TokenStream<JsTokenType> ts

JAVASCRIPT_PUNCTUATOR

private static PunctuationTrie<?> JAVASCRIPT_PUNCTUATOR

TOKEN_BEFORE_REGEXP_LITERAL_RE

private static final java.util.regex.Pattern TOKEN_BEFORE_REGEXP_LITERAL_RE
According to http://www.mozilla.org/js/language/js20/rationale/syntax.html
"To support error recovery, JavaScript 2.0's lexical grammar must be made independent of its syntactic grammar. To make the lexical grammar independent of the syntactic grammar, JavaScript 2.0 determines whether a / starts a regular expression or is a division (or /=) operator solely based on the previous token."

That page then lists the tokens that can precede a RegExp literal, and says:

"Regardless of the previous token, // is interpreted as the beginning of a comment."

This scheme is inconsistent with EcmaScript 3 and planned successors which do not have a context-free lexical grammar. This approximation works well in practice, but will fail in some cases, such as after a ++/-- operator that turns out to be a prefix operator.

Since that document was written, the set of proposed reserved keywords for EcmaScript 4 has changed. David-Sarah Hopwood suggested changing the preceder set in a mail titled "JavaScript lexing" on google-caja-discuss which concluded:

"I think you should:
  1. remove 'field', 'is', 'namespace', 'use', '->', '..', '@', '^^', and '^^=' from validPreceders, and add 'void';
  2. document that [Caja] does not allow '++' or '--' just before a regexp literal;
  3. c) document that [Caja] does not allow a regexp literal as the first token of an expression statement.


INTEGER_LITERAL_RE

private static java.util.regex.Pattern INTEGER_LITERAL_RE
Constructor Detail

JsLexer

public JsLexer(CharProducer producer)

JsLexer

public JsLexer(CharProducer producer,
               boolean isQuasiliteral)
Method Detail

hasNext

public boolean hasNext()
                throws ParseException
Description copied from interface: TokenStream
True if TokenStream.next() is safe to call.

Specified by:
hasNext in interface TokenStream<JsTokenType>
Throws:
ParseException

next

public Token<JsTokenType> next()
                        throws ParseException
Description copied from interface: TokenStream
Returns the next value, and moves the stream position forward.

Specified by:
next in interface TokenStream<JsTokenType>
Throws:
ParseException

isRegexp

static boolean isRegexp(java.lang.String previous)

isJsSpace

public static boolean isJsSpace(char ch)

isJsLineSeparator

public static boolean isJsLineSeparator(char ch)

getPunctuationTrie

public static PunctuationTrie<?> getPunctuationTrie()


Copyright (C) 2008 Google Inc.
Licensed under the Apache License, Version 2.0