Interview with Ploners
Formulator makes life easy
Server side python
Resetting Zope's session timeout
Getting undo history
« Guide to Python Introspection
» Applying IT to IT
Zope can expose every object via XML-RPC. For more information on Zope's XML-RPC capability, read the excellent How-To by Amos Latteier.
I use a simple Python script to clean up HTML using Tidy. Here’s the code for that. Just put it under Zope and you can run your own Tidy XML-RPC service easily!
"""
simple method for tidy cleanup of html source
Put this in $INSTANCE_HOME/Extensions and create an External Method to tidy()
"""
def tidy(self,html,body_only=0):
"""cleans up the html and returns the cleaned up html and the tidy errors
self = Normally None. It is here to satisfy Zope external method
requirements
html = incoming html source
body_only = if true, then only the body part of the cleaned up HTML is
returned
"""
import sys
cmd = 'tidy -config tidy.cfg'
if sys.platform == 'win32':
from win32pipe import popen3
i, o, e = popen3(cmd)
else:
from popen2 import popen3
o, i, e = popen3(cmd)
i.write(html)
i.flush()
i.close()
cln = o.read()
err = e.read()
o.close()
e.close()
if body_only:
import re
body_start_m = re.search('<body.*?>(?i)(?s)', cln)
if body_start_m:
cln = cln[body_start_m.end():]
body_end_m = re.search('</body.*?>(?i)(?s)', cln)
if body_end_m:
cln = cln[:body_end_m.start()]
return cln
if __name__ == '__main__':
html='''<p>Hello world.</p> <ul><li>This is a great place</li></ul>'''
clean = tidy(None,html,1)
print clean
You will need Zope 2.6 to properly package tagged HTML being accepted and returned. If you don’t have 2.6, get the xmlrpc files from 2.6 distro and overwrite the corresponding files in your Zope.
At the moment, my Tidy service is run using PHP. I’m thinking of using the script above instead.
hi,
if you want to get only the stuff between body tags, just add " show-body-only: yes " line in your tidy config file.
Thanks! This completely negates the need for a regex search.