jump to content

Satheesh Babu
2001/03/02

HTML Tidy is a great utility for cleaning up your web pages. You can use a DLL which does the same thing from your server to clean up HTML using a script. This article explains how to get started with HTML Tidy using ASP on PWS/Windows 98.

Getting It

  1. Download TidyCOM.zip from perso.wanadoo.fr/ablavier/TidyCOM/
  2. Read Tidy options documentation from www.w3.org/People/Raggett/tidy/

Installation

  1. Extract both regsvr32.exe and TidyCOM.dll (I extracted these to C:\Windows)
  2. Go to a DOS command window
  3. Register the DLL by typing
    regsvr32.exe c:\windows\TidyCOM.dll
    You MUST give the full path name for the DLL. To unregister, do regsvr32.exe /u c:\windows\TidyCOM.dll
  4. Restart PWS

Testing

  1. Extract the attached zip file to a folder under PWS. I put it under "tidy"
  2. See the file bad.html. It is not exactly bad, but a MS Word document saved as HTML from MS Word 97
  3. Fire up your browser, and point it to http://localhost/tidy/simple.asp
  4. If no error happened, open up bad.html and good.html. See how Tidy formatted it. It does not clean Word 2000 info.
  5. simple.asp sets the option within the code. This can be a pain. Next example uses a configuration file.
  6. Point your browser to http://localhost/tidy/useconf.asp. This uses the configuration file tidyconf.txt
  7. Now compare bad.html and good_2.html. good2.html is free of Word-2000 legacy.

Sample ASP Code

<%@ Language=VBScript %>
<% Option Explicit %>
<%
Dim oTidy
' Creating Tidy Object
Set oTidy = CreateObject("TidyCOM.TidyObject")
' Setting Tidy Options
oTidy.Options.Doctype="strict"
oTidy.Options.DropFontTags=true
oTidy.Options.OutputXhtml=true
oTidy.Options.Indent=2
oTidy.Options.TabSize=8
' Cleaning up file bad.html to good.html
oTidy.TidyToFile Server.MapPath("bad.html"),
                 Server.MapPath("good.html")
' Cleaned up. See good.html
Set oTidy = Nothing
%>

Note: Methods are also available for cleaning up HTML in a string.

Sample Python Code

import win32com.client
objTidy = win32com.client.Dispatch("TidyCOM.TidyObject")
objTidy.Options.Load("tidyconf.txt")
objTidy.TidyToFile("bad.html","good_2.html")
objTidy = NULL

Notes

References