|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.google.caja.lexer.JsLexer
public class JsLexer
Tokenizes javascript source.
Nested Class Summary | |
---|---|
(package private) static class |
JsLexer.WordClassifier
|
Field Summary | |
---|---|
private static java.util.regex.Pattern |
INTEGER_LITERAL_RE
|
private static PunctuationTrie<?> |
JAVASCRIPT_PUNCTUATOR
|
private static java.util.regex.Pattern |
TOKEN_BEFORE_REGEXP_LITERAL_RE
According to http://www.mozilla.org/js/language/js20/rationale/syntax.html "To support error recovery, JavaScript 2.0's lexical grammar must be made independent of its syntactic grammar. |
private TokenStream<JsTokenType> |
ts
|
Constructor Summary | |
---|---|
JsLexer(CharProducer producer)
|
|
JsLexer(CharProducer producer,
boolean isQuasiliteral)
|
Method Summary | |
---|---|
static PunctuationTrie<?> |
getPunctuationTrie()
|
boolean |
hasNext()
True if TokenStream.next() is safe to call. |
static boolean |
isJsLineSeparator(char ch)
|
static boolean |
isJsSpace(char ch)
|
(package private) static boolean |
isRegexp(java.lang.String previous)
|
Token<JsTokenType> |
next()
Returns the next value, and moves the stream position forward. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private TokenStream<JsTokenType> ts
private static PunctuationTrie<?> JAVASCRIPT_PUNCTUATOR
private static final java.util.regex.Pattern TOKEN_BEFORE_REGEXP_LITERAL_RE
"To support error recovery, JavaScript 2.0's lexical grammar must be made independent of its syntactic grammar. To make the lexical grammar independent of the syntactic grammar, JavaScript 2.0 determines whether a / starts a regular expression or is a division (or /=) operator solely based on the previous token."
That page then lists the tokens that can precede a RegExp literal, and says:
"Regardless of the previous token, // is interpreted as the beginning of a comment."
This scheme is inconsistent with EcmaScript 3 and planned successors which do not have a context-free lexical grammar. This approximation works well in practice, but will fail in some cases, such as after a ++/-- operator that turns out to be a prefix operator.
Since that document was written, the set of proposed reserved keywords for EcmaScript 4 has changed. David-Sarah Hopwood suggested changing the preceder set in a mail titled "JavaScript lexing" on google-caja-discuss which concluded:
"I think you should:
- remove 'field', 'is', 'namespace', 'use', '->', '..', '@', '^^', and '^^=' from validPreceders, and add 'void';
- document that [Caja] does not allow '++' or '--' just before a regexp literal;
- c) document that [Caja] does not allow a regexp literal as the first token of an expression statement.
private static java.util.regex.Pattern INTEGER_LITERAL_RE
Constructor Detail |
---|
public JsLexer(CharProducer producer)
public JsLexer(CharProducer producer, boolean isQuasiliteral)
Method Detail |
---|
public boolean hasNext() throws ParseException
TokenStream
TokenStream.next()
is safe to call.
hasNext
in interface TokenStream<JsTokenType>
ParseException
public Token<JsTokenType> next() throws ParseException
TokenStream
next
in interface TokenStream<JsTokenType>
ParseException
static boolean isRegexp(java.lang.String previous)
public static boolean isJsSpace(char ch)
public static boolean isJsLineSeparator(char ch)
public static PunctuationTrie<?> getPunctuationTrie()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |