com.quiotix.html.parser
Class HtmlScrubber

java.lang.Object
  extended by com.quiotix.html.parser.HtmlVisitor
      extended by com.quiotix.html.parser.HtmlScrubber

public class HtmlScrubber
extends HtmlVisitor

HtmlScrubber is a Visitor which walks an HtmlDocument and cleans it up. It can change tags and tag attributes to uppercase or lowercase, strip out unnecessary quotes from attribute values, and strip trailing spaces before a newline.

Author:
Brian Goetz, Quiotix Additional contributions by: Thorsten Weber

Field Summary
static int ATTR_DOWNCASE
          Set attribute case to lower.
static int ATTR_UPCASE
          Set attribute case to upper.
static int DEFAULT_OPTIONS
          Defaults: downcase tags and attributes, quote attributes.
protected  int flags
           
protected  boolean inPreBlock
           
protected  HtmlDocument.HtmlElement previousElement
           
static int QUOTE_ATTRS
          Quote attributes.
static int STRIP_QUOTES
          Remove quotes.
static int TAGS_DOWNCASE
          Set tag case to lower.
static int TAGS_UPCASE
          Set tag case to upper.
static int TRIM_SPACES
          Trim spaces.
 
Constructor Summary
HtmlScrubber()
          Create an HtmlScrubber with the default options (downcase tags and tag attributes, strip out unnecessary quotes).
HtmlScrubber(int flags)
          Create an HtmlScrubber with the desired set of options.
 
Method Summary
 void start()
          Start.
 void visit(HtmlDocument.Annotation a)
          Visit an Annotation.
 void visit(HtmlDocument.Comment c)
          Visit a Comment.
 void visit(HtmlDocument.EndTag t)
          Visit an EndTag.
 void visit(HtmlDocument.Newline n)
          Visit a Newline.
 void visit(HtmlDocument.Tag t)
          Visit a Tag.
 void visit(HtmlDocument.TagBlock bl)
          Visit a TagBlock.
 void visit(HtmlDocument.Text t)
          Visit Text.
 
Methods inherited from class com.quiotix.html.parser.HtmlVisitor
finish, visit, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TAGS_UPCASE

public static final int TAGS_UPCASE
Set tag case to upper.

See Also:
Constant Field Values

TAGS_DOWNCASE

public static final int TAGS_DOWNCASE
Set tag case to lower.

See Also:
Constant Field Values

ATTR_UPCASE

public static final int ATTR_UPCASE
Set attribute case to upper.

See Also:
Constant Field Values

ATTR_DOWNCASE

public static final int ATTR_DOWNCASE
Set attribute case to lower.

See Also:
Constant Field Values

STRIP_QUOTES

public static final int STRIP_QUOTES
Remove quotes.

See Also:
Constant Field Values

TRIM_SPACES

public static final int TRIM_SPACES
Trim spaces.

See Also:
Constant Field Values

QUOTE_ATTRS

public static final int QUOTE_ATTRS
Quote attributes.

See Also:
Constant Field Values

DEFAULT_OPTIONS

public static final int DEFAULT_OPTIONS
Defaults: downcase tags and attributes, quote attributes.

See Also:
Constant Field Values

flags

protected int flags

previousElement

protected HtmlDocument.HtmlElement previousElement

inPreBlock

protected boolean inPreBlock
Constructor Detail

HtmlScrubber

public HtmlScrubber()
Create an HtmlScrubber with the default options (downcase tags and tag attributes, strip out unnecessary quotes).


HtmlScrubber

public HtmlScrubber(int flags)
Create an HtmlScrubber with the desired set of options.

Parameters:
flags - A bitmask representing the desired scrubbing options
Method Detail

start

public void start()
Description copied from class: HtmlVisitor
Start.

Overrides:
start in class HtmlVisitor

visit

public void visit(HtmlDocument.Tag t)
Description copied from class: HtmlVisitor
Visit a Tag.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.EndTag t)
Description copied from class: HtmlVisitor
Visit an EndTag.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Text t)
Description copied from class: HtmlVisitor
Visit Text.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Comment c)
Description copied from class: HtmlVisitor
Visit a Comment.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Newline n)
Description copied from class: HtmlVisitor
Visit a Newline.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.Annotation a)
Description copied from class: HtmlVisitor
Visit an Annotation.

Overrides:
visit in class HtmlVisitor

visit

public void visit(HtmlDocument.TagBlock bl)
Description copied from class: HtmlVisitor
Visit a TagBlock.

Overrides:
visit in class HtmlVisitor


Copyright © 1999-2011 Quiotix. All Rights Reserved.