Lightweight HTML Scanner: Common Mistakes

  • Tags that match the first entry in my list of tag names are never detected.
  • The first entry (index 0) in the list of tag names is reserved for non-tag text content. So the solution is to assign the desired tag name to index 1, by inserting a 'dummy' element at index 0 in your list of tag names; do not forget to increment all other tag ID values.

    Note that if this dummy element [0] is null, text content is skipped; if any other value, text content is returned as an HTMLTag object with iID == 0.

  • Tags in a <script>...</script> block are not parsed.
  • This is correct behaviour. The text content in a <script>...</script> block is not HTML syntax (typically it is JavaScript syntax). To parse it, you have to send it through an appropriate parser.