Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Joy of Python: Blogrolling
» Women are better managers?

Joy of Python: Blogrolling 1.2

v1.2 of the blogrolling script introduces storing options in a configuration file.

Yesterday, a simple script was made to check the blogs I’m interested in. One thing I don’t like in any program is to embed data values within the code. If you look at yesterday’s script, the blogs were being maintained in a list, within script itself. This iteration - remember, the goal here is to make changes as small as possible, makes life easier a bit and introduces something new for a reason - makes the script read from a configuration file. It is not perfect, since the configuration file’s name is hard coded in the script. Passing that as an argument is left for another day.

Python has a ConfigParser module that is made for parsing configuration files.

"""
This is a little script to help you do what www.blogrolling.com
can do. Define the blogs you are interested in, put the script
as a scheduled job or as a cgi and include the output in the appropriate
section of your blog template. (I wouldn’t recommend making this run
for every page load, since that will slow down your page quite a bit.)

You can see this in action on the front page of my blog at
http://vsbabu.org/mt/

S Babu:
joys of python: is a series of small and often silly scripts using
python that does something while explaining some nice features of
python. Absolutely newbie material.

http://vsbabu.org/mt/archives/categories/python/

All small values of joy are marked by comments starting with #joy:
"""
#joy: no more separate comment block. Triple quoted starting comment
# is python’s documentation string. You can access it as __doc__
import urllib
import time

class Blog:
""" Blog: a simple class that stores the attributes of a typical blog
"""
#joy:documentation strings for a class or a function or the whole script
# can be added within triple quotes. And these are available within the
# predefined variable __doc__
def __init__(self, url, title=None, description=None, rdf=None, priority=50):
""" constructor to create a new instance""" self.url = url self.title = title self.description = description #this could potentially be enhanced by using Mark Pilgrim’s rss
#autodiscovery tool
self.rdf = rdf self.priority = priority #I’m using rdf to check for modified date since some servers
#utilizing SSI or PHP does not return modified date
self.modified_date = self.update_modified_date(self.rdf) def update_modified_date(self, url=None):
"""gets the modified time of the blog. One may optionally specify
another url to check too"""
#see time module’s documentation. I’m storing modified_date
#as seconds since epoch
if url is None:
url = self.url
try:
s = urllib.urlopen(url)
message = s.info()
try:
dt = message.getdate('Last-modified')
if dt is None:
dt = message.getdate('Last-Modified')
self.modified_date = time.mktime(dt)
except:
return None
except:
#this is an indication of a broken link.
#Exercise: handle this nicely!
return None
return self.modified_date

def show_html_string(self, modified_since=3600, current_time=None):
"""returns a string that can be used as an HTML link.

If the blog has been modified within the specified
number of seconds, print a * next to it too."""
if current_time is None:
current_time = time.mktime(time.localtime())
s = """<a href="%s" title="%s">%s</a>""" % (self.url,
self.description,
self.title)
#v1.1 -> v1.2 #This makes it kind of useless. If I can’t figure out the modified #date, then this is not going to work
if (self.modified_date is not None) and (current_time-self.modified_date < modified_since):
s = s + "*" return s

#v1.1 -> v1.2 start def get_blog_option(config, section, option):
"""simple helper function to get the options""" if config.has_option(section, option):
return config.get(section, option)
else:
return None
#v1.1 -> v1.2 end
#joy:I can specify what is the main part of the script by
# by simply equating the predefined variable __name__ to
# '__main__'. Exercise. Try printing __name__ within a function
# Other scripts can reuse my class by just doing an
# from thisfile import Blog
#
# This feature can be used to define test routines for your
# module.
if __name__ == '__main__':

#v1.1 -> v1.2 start #joy:in python, you can easily import only the required # part of the module #ConfigParser can parse files formatted like Windows .ini files from ConfigParser import ConfigParser
config = ConfigParser()
#blogroll.cfg is my configuration file config.readfp(open("blogroll.cfg"))
#to read multiple configuration files, use #config.read([file1, file2...])
 #define a list of blogs that I visit often
blogroll = [] #joy:python’s lists are really too good. you can append()
# to a list. Or insert(). Or remove(). Check the documentation
# for more functions available for lists. for blog in config.sections():
url = get_blog_option(config, blog, "url")
rdf = get_blog_option(config, blog, "rdf")
description = get_blog_option(config, blog, "description")
priority = get_blog_option(config, blog, "priority")
if priority is not None:
priority = int(priority)
blogroll.append(Blog(url, blog, description, rdf, priority))

#joy: sort a list very easily by providing our own comparison function
# lambda is a means to define inline functions. cmp() compares two
# strings. I’m sorting my list by title
blogroll.sort(lambda x,y: cmp(x.title, y.title))

current_time = time.mktime(time.localtime())

for blog in blogroll:
print blog.show_html_string(3600, current_time), "<br/>" # blogroll.cfg sample [vsbabu.org] url = http://vsbabu.org/mt rdf = http://vsbabu.org/mt/index.xml description = Gluing passing thoughts to foregone conclusions priority = 50 [Simon Brunning] url = http://www.brunningonline.net/simon/blog rdf = http://www.brunningonline.net/simon/blog/index.xml description = Small values of cool
//-->