|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.quiotix.html.parser.HtmlParser
public class HtmlParser
This grammar parses an HTML document and produces a (flat) parse "tree" representing the document. It preserves almost all information in the source document, including carriage control and spacing (except inside of tags.) See the HtmlDocument and HtmlDocument.* classes for a description of the parse tree. The parse tree supports traversal using the commonly used "Visitor" pattern. The HtmlDumper class is a visitor which dumps out the tree to an output stream. It does not require begin tags to be matched with end tags, or validate the names or contents of the tags (this can easily be done post-parsing; see the HtmlCollector class (which matches begin tags with end tags) for an example.) Notable edge cases include: - Quoted string processing. Quoted strings are matched inside of comments, and as tag attribute values. Quoted strings are matched in normal text only to the extent that they do not span line breaks. Please direct comments, questions, gripes or praise to html-parser@quiotix.com. If you like it/hate it/use it, please let us know!
Field Summary | |
---|---|
Token |
jj_nt
Next token. |
boolean |
lookingAhead
Whether we are looking ahead. |
Token |
token
Current token. |
HtmlParserTokenManager |
token_source
Generated Token Manager. |
Fields inherited from interface com.quiotix.html.parser.HtmlParserConstants |
---|
ALPHA_CHAR, ALPHANUM_CHAR, ATTR_EQ, ATTR_NAME, ATTR_VAL, BLOCK_EOL, BLOCK_LBR, BLOCK_WORD, COMMENT_END, COMMENT_EOL, COMMENT_START, COMMENT_WORD, DASH, DECL_ANY, DECL_END, DECL_START, DEFAULT, ENDTAG_START, EOF, EOL, IDENTIFIER, IDENTIFIER_CHAR, IMPLICIT_TAG_END, LAV_ERROR, LexAttrVal, LexComment, LexDecl, LexInTag, LexScript, LexStartTag, LexStyle, LIT_ERROR, LST_ERROR, NEWLINE, NUM_CHAR, PCDATA, QUOTE, QUOTED_STRING, QUOTED_STRING_NB, SCRIPT_END, STYLE_END, TAG_END, TAG_NAME, TAG_SCRIPT, TAG_SLASHEND, TAG_START, TAG_STYLE, tokenImage, WHITESPACE |
Constructor Summary | |
---|---|
HtmlParser(HtmlParserTokenManager tm)
Constructor with generated Token Manager. |
|
HtmlParser(InputStream stream)
Constructor with InputStream. |
|
HtmlParser(InputStream stream,
String encoding)
Constructor with InputStream and supplied encoding |
|
HtmlParser(Reader stream)
Constructor. |
Method Summary | |
---|---|
HtmlDocument.Attribute |
Attribute()
|
HtmlDocument.AttributeList |
AttributeList()
|
HtmlDocument.ElementSequence |
BlockContents()
|
HtmlDocument.Comment |
CommentTag()
|
HtmlDocument.Comment |
DeclTag()
|
void |
disable_tracing()
Disable tracing. |
HtmlDocument.HtmlElement |
Element()
|
HtmlDocument.ElementSequence |
ElementSequence()
|
void |
enable_tracing()
Enable tracing. |
HtmlDocument.HtmlElement |
EndTag()
|
ParseException |
generateParseException()
Generate ParseException. |
Token |
getNextToken()
Get the next Token. |
Token |
getToken(int index)
Get the specific Token. |
HtmlDocument |
HtmlDocument()
Constructor. |
static void |
main(String[] args)
Runnable. |
void |
ReInit(HtmlParserTokenManager tm)
Reinitialise. |
void |
ReInit(InputStream stream)
Reinitialise. |
void |
ReInit(InputStream stream,
String encoding)
Reinitialise. |
void |
ReInit(Reader stream)
Reinitialise. |
HtmlDocument.HtmlElement |
ScriptBlock()
|
HtmlDocument.HtmlElement |
StyleBlock()
|
HtmlDocument.HtmlElement |
Tag()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public HtmlParserTokenManager token_source
public Token token
public Token jj_nt
public boolean lookingAhead
Constructor Detail |
---|
public HtmlParser(InputStream stream)
public HtmlParser(InputStream stream, String encoding)
public HtmlParser(Reader stream)
public HtmlParser(HtmlParserTokenManager tm)
Method Detail |
---|
public static void main(String[] args) throws ParseException
ParseException
public final HtmlDocument HtmlDocument() throws ParseException
ParseException
public final HtmlDocument.ElementSequence ElementSequence() throws ParseException
ParseException
public final HtmlDocument.HtmlElement Element() throws ParseException
ParseException
public final HtmlDocument.Attribute Attribute() throws ParseException
ParseException
public final HtmlDocument.AttributeList AttributeList() throws ParseException
ParseException
public final HtmlDocument.HtmlElement Tag() throws ParseException
ParseException
public final HtmlDocument.ElementSequence BlockContents() throws ParseException
ParseException
public final HtmlDocument.HtmlElement ScriptBlock() throws ParseException
ParseException
public final HtmlDocument.HtmlElement StyleBlock() throws ParseException
ParseException
public final HtmlDocument.HtmlElement EndTag() throws ParseException
ParseException
public final HtmlDocument.Comment CommentTag() throws ParseException
ParseException
public final HtmlDocument.Comment DeclTag() throws ParseException
ParseException
public void ReInit(InputStream stream)
public void ReInit(InputStream stream, String encoding)
public void ReInit(Reader stream)
public void ReInit(HtmlParserTokenManager tm)
public final Token getNextToken()
public final Token getToken(int index)
public ParseException generateParseException()
public final void enable_tracing()
public final void disable_tracing()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |