Lightweight HTML Scanner home: (Belgium)

Lightweight HTML Scanner

This is not easy to parse yourself. <!-- or is it? -->
<!-- ========== START OF NAVBAR ========== --> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3"> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="overview-summary.html"><FONT CLASS="NavBarFont1"> <B>Overview</B></FONT></A>&nbsp;</TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <FONT CLASS="NavBarFont1">Package</FONT>&nbsp;</TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <FONT CLASS="NavBarFont1">Class</FONT>&nbsp;</TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="index-all.html"><FONT CLASS="NavBarFont1"> <B>Index</B></FONT></A>&nbsp;</TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Help</B></FONT>&nbsp;</TD> </TR> </TABLE> <!-- =========== END OF NAVBAR =========== -->

Welcome to the Lightweight HTML Scanner.

newsince 05/sep/2001.
Public release version 2.00 (with free evaluation download)

What is the Lightweight HTML Scanner?

The Lightweight HTML Scanner is a set of fast Java classes to scan or parse HTML documents. It provides applets and applications with an easy-to-handle list of the syntax elements of the HTML document. Both HTML tags and content text can be extracted for handling the way you need to.


The Lightweight HTML Scanner enables you to scan a HTML document for only the syntax elements you need. The benefits of the Lightweight HTML Scanner approach are:

the Lightweight HTML Scanner closely follows HTML parsing behaviour common to Netscape Navigator and Microsoft Internet Explorer, both based on Mosaic. Even malformed HTML will be handled as it is in these browsers.
Distribution size
The essential classes of the Lightweight HTML Scanner are only 4 kB in size (jarred, production version). The set of API methods is equally small, enabling you to keep your own classes light as well.
By scanning only for the HTML syntax elements you need, no time is wasted.

The Lightweight HTML Scanner does not build a Document Object Model of some sort, because

  1. Most HTML documents on the web are not well-formed and do not really fit a Document Object Model.
  2. This adds weight to your applets/applications that is not needed for many uses.
  3. Navigating the returned Document Object Model will probably be more complicated for the application programmer than running over a list of HTML tags and content.
  4. There is no established standard Document Object Model.
  5. There exist free classes to do this (e.g. in the standard Java 2 libraries).

The Lightweight HTML Scanner is compiled with Java 2, but has been thoroughly tested with Java 1.1.8.

How much will the Lightweight HTML Scanner cost me?

Most of you will have to pay nothing (Niente! Nada! Nichts! Nullo! Rien de knots! Nil! Nougabollen!), as we have free licenses for developers, private users and evaluators. You find more details in the license options summary and on the download page

Information at this site

Some simple programs to analyze HTML?
Isn't this new JavaDoc beautifull?
License Options
A short overview of the different licensing options.
Download / Purchase
Download the Evaluation Version of the Lightweight HTML Scanner, or purchase the Production Version.
Installation instructions
How do you extract and install the Lightweight HTML Scanner software.
How the Lightweight HTML Scanner came into existence, and the revision history.

Special thanks to the guys at Javasoft, who are helping us to make this world virtually better.

You are visitor since 5/sep/2001. (Counter by Net Digits)

e-mail us at :