com.google.caja.lexer
Class HtmlInputSplitter

java.lang.Object
  extended by com.google.caja.lexer.AbstractTokenStream<HtmlTokenType>
      extended by com.google.caja.lexer.HtmlInputSplitter
All Implemented Interfaces:
TokenStream<HtmlTokenType>

final class HtmlInputSplitter
extends AbstractTokenStream<HtmlTokenType>

A token stream that breaks a character stream into HtmlTokenType.{TEXT,TAGBEGIN,TAGEND,DIRECTIVE,COMMENT,CDATA,DIRECTIVE} tokens. The matching of attribute names and values is done in a later step.


Nested Class Summary
private static class HtmlInputSplitter.State
          States for a state machine for optimistically identifying tags and other html/xml/phpish structures.
 
Field Summary
private  boolean asXml
          Should the input be considered xml? are escape exempt blocks allowed?
private  java.lang.String escapeExemptTagName
          Null or the name of the close tag required to end the current escape exempt block.
private  boolean inEscapeExemptBlock
          True if inside a script, xmp, listing, or similar tag whose content does not follow the normal escaping rules.
private  boolean inTag
          True iff the current character is inside a tag.
private  java.lang.String lastNonIgnorable
           
private  CharProducer p
          The source of HTML character data.
private  HtmlTextEscapingMode textEscapingMode
           
 
Constructor Summary
HtmlInputSplitter(CharProducer p)
           
 
Method Summary
 boolean getTreatedAsXml()
          True iff this is treated as xml.
private  boolean isIdentStart(char ch)
           
private  java.lang.String name(int start, int end)
           
protected  java.lang.String name(java.lang.String tagName)
           
private  Token<HtmlTokenType> parseToken()
          Breaks the character stream into tokens.
protected  Token<HtmlTokenType> produce()
          Make sure that there is a token ready to yield in this.token.
(package private) static
<T extends TokenType>
Token<T>
reclassify(Token<T> token, T type)
           
 void setTreatedAsXml(boolean asXml)
           
 
Methods inherited from class com.google.caja.lexer.AbstractTokenStream
hasNext, next
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

asXml

private boolean asXml
Should the input be considered xml? are escape exempt blocks allowed?


p

private final CharProducer p
The source of HTML character data.


inTag

private boolean inTag
True iff the current character is inside a tag.


inEscapeExemptBlock

private boolean inEscapeExemptBlock
True if inside a script, xmp, listing, or similar tag whose content does not follow the normal escaping rules.


escapeExemptTagName

private java.lang.String escapeExemptTagName
Null or the name of the close tag required to end the current escape exempt block. Preformatted tags include <script>, <xmp>, etc. that may contain unescaped HTML input.


textEscapingMode

private HtmlTextEscapingMode textEscapingMode

lastNonIgnorable

private java.lang.String lastNonIgnorable
Constructor Detail

HtmlInputSplitter

public HtmlInputSplitter(CharProducer p)
Method Detail

getTreatedAsXml

public boolean getTreatedAsXml()
True iff this is treated as xml. Xml-ness affects the treatment of script tags, which must be CDATA or HTML-escaped in GXPs and other xml types, but are specially handled by HTML parsers.


setTreatedAsXml

public void setTreatedAsXml(boolean asXml)
See Also:
getTreatedAsXml()

produce

protected Token<HtmlTokenType> produce()
Make sure that there is a token ready to yield in this.token.

Specified by:
produce in class AbstractTokenStream<HtmlTokenType>

parseToken

private Token<HtmlTokenType> parseToken()
Breaks the character stream into tokens. This method returns a stream of tokens such that each token starts where the last token ended.

This property is useful as it allows fetch to collapse and reclassify ranges of tokens based on state that is easy to maintain there.

Later passes are responsible for throwing away useless tokens.


name

protected java.lang.String name(java.lang.String tagName)

name

private java.lang.String name(int start,
                              int end)

isIdentStart

private boolean isIdentStart(char ch)

reclassify

static <T extends TokenType> Token<T> reclassify(Token<T> token,
                                                 T type)


Copyright (C) 2008 Google Inc.
Licensed under the Apache License, Version 2.0