Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Using XPath and XPointer
» Joy of Python: Blogrolling 1.2

Joy of Python: Blogrolling

Simple Python script to get blogrolling rolling.

This is a small script, aimed mainly at blog owners. From Fozbaca’s site, I came to know about a site Blogrolling.com where you can create a web account and then maintain the blogs you want to watch. Promptly, I registered there and by the time I typed in my 3rd link, I was tired - especially thinking I had to put some code on my site too.

Why bother? Python has really good http library (urllib) that can be used to create pretty much the same functionality.

"""
This is a little script to help you do what www.blogrolling.com
can do. Define the blogs you are interested in, put the script
as a scheduled job or as a cgi and include the output in the appropriate
section of your blog template. (I wouldn’t recommend making this run
for every page load, since that will slow down your page quite a bit.)

You can see this in action on the front page of my blog at
http://vsbabu.org/mt/

S Babu:
joys of python: is a series of small and often silly scripts using
python that does something while explaining some nice features of
python. Absolutely newbie material.

http://vsbabu.org/mt/archives/categories/python/

All small values of joy are marked by comments starting with #joy:
"""

#joy: no more separate comment block. Triple quoted starting comment
# is python’s documentation string. You can access it as __doc__


import urllib
import time

class Blog:
""" Blog: a simple class that stores the attributes of a typical blog
"""

#joy:documentation strings for a class or a function or the whole script
# can be added within triple quotes. And these are available within the
# predefined variable __doc__

def __init__(self, url, title=None, description=None, rdf=None, priority=50):
""" constructor to create a new instance"""
self.url = url
self.title = title
self.description = description
#this could potentially be enhanced by using Mark Pilgrim’s rss
#autodiscovery tool

self.rdf = rdf
self.priority = priority
#I’m using rdf to check for modified date since some servers
#utilizing SSI or PHP does not return modified date

self.modified_date = self.update_modified_date(self.rdf)

def update_modified_date(self, url=None):
"""gets the modified time of the blog. One may optionally specify
another url to check too"""

#see time module’s documentation. I’m storing modified_date
#as seconds since epoch

if url is None:
url = self.url
try:
s = urllib.urlopen(url)
message = s.info()
try:
dt = message.getdate('Last-modified')
if dt is None:
dt = message.getdate('Last-Modified')
self.modified_date = time.mktime(dt)
except:
return None
except:
#this is an indication of a broken link.
#Exercise: handle this nicely!
return None
return self.modified_date

def show_html_string(self, modified_since=3600, current_time=None):
"""returns a string that can be used as an HTML link.

If the blog has been modified within the specified
number of seconds, print a * next to it too."""

if current_time is None:
current_time = time.mktime(time.localtime())
s = """<a href="%s" title="%s">%s</a>""" % (self.url, self.description, self.title)
if current_time-self.modified_date < modified_since:
s = s + "*"
return s

#joy:I can specify what is the main part of the script by
# by simply equating the predefined variable __name__ to
# '__main__'. Exercise. Try printing __name__ within a function
# Other scripts can reuse my class by just doing an
# from thisfile import Blog
#
# This feature can be used to define test routines for your
# module.

if __name__ == '__main__':

#define a list of blogs that I visit often

blogroll = []

#joy:python’s lists are really too good. you can append()
# to a list. Or insert(). Or remove(). Check the documentation
# for more functions available for lists.

blogroll.append(Blog("http://www.fozbaca.org/",
"Fozbaca",
"This is fozbaca’s blog",
"http://fozbaca.org/rss.xml"))
blogroll.append(Blog("http://pythonowns.blogspot.com/",
"Jarno Virtanen",
"Python Owns Us",
"http://db.cs.helsinki.fi/%7Ejajvirta/pythonowns.rss"))
blogroll.append(Blog("http://www.brunningonline.net/simon/blog/",
"Simon Brunning",
"Small Values of Cool",
"http://www.brunningonline.net/simon/blog/index.xml"))
blogroll.append(Blog("http://www.diveintomark.org/",
"Mark Pilgrim",
"Dive into everything!",
"http://www.diveintomark.org/xml/rss.xml"))
blogroll.append(Blog("http://www.windley.com/",
"Phil Windley",
"CIO of Utah",
"http://www.windley.com/rss.xml"))
blogroll.append(Blog("http://weblog.xlle.com/",
"Nairomia",
"Sachin Nair",
"http://weblog.xlle.com/index.rdf"))
blogroll.append(Blog("http://weblog.infoworld.com/udell/",
"Jon Udell",
"Radio",
"http://weblog.infoworld.com/udell/rss.xml"))
blogroll.append(Blog("http://radio.weblogs.com/0106123/",
"Jeffrey Shell",
"Industrie Toulouse",
"http://radio.weblogs.com/0106123/rss.xml"))

#joy: sort a list very easily by providing our own comparison function
# lambda is a means to define inline functions. cmp() compares two
# strings. I’m sorting my list by title

blogroll.sort(lambda x,y: cmp(x.title, y.title))

current_time = time.mktime(time.localtime())

for blog in blogroll:
print blog.show_html_string(3600, current_time), "<br/>"