Related Entries

Interview with Ploners
Formulator makes life easy
Server side python
Resetting Zope's session timeout
Getting undo history

« Guide to Python Introspection
» Applying IT to IT

Running a Tidy XML-RPC service using Zope

Thinking web services? Here's a practical example for easily enabling a script via XML-RPC and Zope.

Zope can expose every object via XML-RPC. For more information on Zope's XML-RPC capability, read the excellent How-To by Amos Latteier.

I use a simple Python script to clean up HTML using Tidy. Here’s the code for that. Just put it under Zope and you can run your own Tidy XML-RPC service easily!

 simple method for tidy cleanup of html source

 Put this in $INSTANCE_HOME/Extensions and create an External Method to tidy()
def tidy(self,html,body_only=0):
    """cleans up the html and returns the cleaned up html and the tidy errors

    self = Normally None. It is here to satisfy Zope external method
    html = incoming html source
    body_only = if true, then only the body part of the cleaned up HTML is
    import sys
    cmd = 'tidy -config tidy.cfg'
    if sys.platform == 'win32':
        from win32pipe import popen3
        i, o, e = popen3(cmd)
        from popen2 import popen3
        o, i, e = popen3(cmd)
    cln =
    err =
    if body_only:
         import re
         body_start_m ='<body.*?>(?i)(?s)', cln)
         if body_start_m:
            cln = cln[body_start_m.end():]
         body_end_m ='</body.*?>(?i)(?s)', cln)
         if body_end_m:
            cln = cln[:body_end_m.start()]
    return cln

if __name__ == '__main__':
    html='''<p>Hello world.</p> <ul><li>This is a great place</li></ul>'''
    clean = tidy(None,html,1)
    print clean

You will need Zope 2.6 to properly package tagged HTML being accepted and returned. If you don’t have 2.6, get the xmlrpc files from 2.6 distro and overwrite the corresponding files in your Zope.

At the moment, my Tidy service is run using PHP. I’m thinking of using the script above instead.

  1. hi,
    if you want to get only the stuff between body tags, just add " show-body-only: yes " line in your tidy config file.

    Posted by: Henrique on May 21, 2003 06:02 PM
  2. Thanks! This completely negates the need for a regex search.

    Posted by: Babu on June 6, 2003 05:12 PM