Lightweigth HTML Scanner 2.00

be.arci.html
Class HTMLTag

java.lang.Object
  |
  +--be.arci.html.HTMLTag

public final class HTMLTag
extends java.lang.Object

Lightweight immutable class that encapsulates a HTML tag or HTML content text recognized by HTMLScanner.getTags(). A HTMLTag basically is a substring of the scanned document, with an ID field for the type of HTMLTag.

See Also:
HTMLScanner.getTags(String[] asTagNames, boolean swDiscardOtherTags)

Field Summary
 int iBeginIndex
          the substring() beginIndex of the HTMLTag (including '<' character).
 int iEndIndex
          the substring() endIndex of the HTMLTag (including '>' character).
 int iID
          ID or type of the HTMLTag.
 boolean swCombineWhitespace
          If true, multiple whitespace characters are combined to a single space (' ').
 boolean swParseEscapes
          If true, HTML character escapes (named character entity references and numerical character references of the form "&999;") are interpreted.
 
Method Summary
 java.lang.StringBuffer accumulateContent(java.lang.StringBuffer sb)
          Accumulates this HTMLTag's HTML document text content into the StringBuffer argument.
 java.lang.StringBuffer accumulateContent(java.lang.StringBuffer sb, boolean swParseEscapes, boolean swCombineWhitespace)
          Accumulates this HTMLTag's HTML document text content into the StringBuffer argument.
 java.lang.String getAttribute(java.lang.String sAttribute)
          Returns the value of the named attribute in this HTMLTag.
 java.lang.String toString()
          Returns the substring of the HTML document that defines this HTMLTag, including any contained, uncombined whitespace, uninterpreted escape sequences, and the < and > delimiters.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

iID

public final int iID
ID or type of the HTMLTag.

This ID is

The ID value of a closing tag (</tagname>) is the negative value of the ID of the corresponding opening tag (<tagname>).

See Also:
HTMLScanner.getTags(java.lang.String[], boolean)

iBeginIndex

public final int iBeginIndex
the substring() beginIndex of the HTMLTag (including '<' character).

See Also:
"java.lang.String.substring(beginIndex, endIndex)"

iEndIndex

public final int iEndIndex
the substring() endIndex of the HTMLTag (including '>' character).

See Also:
"java.lang.String.substring(beginIndex, endIndex)"

swParseEscapes

public final boolean swParseEscapes
If true, HTML character escapes (named character entity references and numerical character references of the form "&999;") are interpreted.

See Also:
accumulateContent(StringBuffer)

swCombineWhitespace

public final boolean swCombineWhitespace
If true, multiple whitespace characters are combined to a single space (' ').

See Also:
accumulateContent(StringBuffer)
Method Detail

toString

public java.lang.String toString()
Returns the substring of the HTML document that defines this HTMLTag, including any contained, uncombined whitespace, uninterpreted escape sequences, and the < and > delimiters.
Overrides:
toString in class java.lang.Object

accumulateContent

public java.lang.StringBuffer accumulateContent(java.lang.StringBuffer sb)
Accumulates this HTMLTag's HTML document text content into the StringBuffer argument. This method is a no-op if iID != 0.

Depending on the context of the containing HTML document, character entity references and numerical character references are interpreted and multiple whitespace characters are combined to a single space (' ', or not.

This method does not change the state of this HTMLTag, so it can be called multiple times.

Parameters:
sb - the StringBuffer to accumulate text content into. If null, a new StringBuffer will be allocated by this method.
See Also:
accumulateContent(StringBuffer sb, boolean swParseEscapes, boolean swCombineWhitespace), swParseEscapes, swCombineWhitespace, iID

accumulateContent

public java.lang.StringBuffer accumulateContent(java.lang.StringBuffer sb,
                                                boolean swParseEscapes,
                                                boolean swCombineWhitespace)
Accumulates this HTMLTag's HTML document text content into the StringBuffer argument. This method is a no-op if iID != 0.

This method does not change the state of this HTMLTag, so it can be called multiple times.

Parameters:
sb - the StringBuffer to accumulate text content into. If null, a new StringBuffer will be allocated by this method.
swParseEscapes - if true, HTML character entity references and numerical character references ("&...;") escapes are interpreted. Overrides the swParseEscapes setting of this HTMLTag
swCombineWhitespace - if true, multiple whitespace characters are combined to a single space (' '). Overrides the swCombineWhitespace setting of this HTMLTag.
See Also:
accumulateContent(StringBuffer sb), iID

getAttribute

public java.lang.String getAttribute(java.lang.String sAttribute)
Returns the value of the named attribute in this HTMLTag. If this HTMLTag is text content (iID == 0), or if the named attribute is not present in the tag, null is returned. If the attribute is present, but no value specification follows it, the empty String "" is returned.

If the value is to represent a color value, it can be fed into HTMLColors.getColor() without testing for a valid return value.

Example
If the tag represent the syntax element <IMG SRC=donaldknut.jpg>, a call getAttribute("src"); returns the String value "donaldknut.jpg".

This method does not change the state of this HTMLTag, so it can be called multiple times.

Parameters:
sAttribute - case-insensitive attribute name
Returns:
the value for the named attribute, or null if not present.
See Also:
HTMLColors.getColor(String sColor)

Lightweigth HTML Scanner 2.00