vsbabu.org:Joy of Python: Blogrolling 1.2

« Joy of Python: Blogrolling
» Women are better managers?

Joy of Python: Blogrolling 1.2

v1.2 of the blogrolling script introduces storing options in a configuration file.

Yesterday, a simple script was made to check the blogs I’m interested in. One thing I don’t like in any program is to embed data values within the code. If you look at yesterday’s script, the blogs were being maintained in a list, within script itself. This iteration - remember, the goal here is to make changes as small as possible, makes life easier a bit and introduces something new for a reason - makes the script read from a configuration file. It is not perfect, since the configuration file’s name is hard coded in the script. Passing that as an argument is left for another day.

Python has a ConfigParser module that is made for parsing configuration files.

                  
"""
This is a  little script to help you do what www.blogrolling.com
can do. Define the blogs you are interested in, put the script
as a scheduled job or as a cgi and include the output in the appropriate
section of your blog template. (I wouldn’t recommend making this run
for every page load, since that will slow down your page quite a bit.)

You can see this in action on the front page of my blog at
http://vsbabu.org/mt/

S Babu:
joys of python: is a series of small and often silly scripts using
python that does something while explaining some nice features of
python. Absolutely newbie material.

http://vsbabu.org/mt/archives/categories/python/

All small values of joy are marked by comments starting with #joy:
"""

#joy: no more separate comment block. Triple quoted starting comment
#     is python’s documentation string. You can access it as __doc__


import urllib
import time

class Blog:
    """ Blog: a simple class that stores the attributes of a typical blog
    """

    #joy:documentation strings for a class or a function or the whole script
    #    can be added within triple quotes. And these are available within the
    #    predefined variable __doc__

    def  __init__(self, url, title=None, description=None, rdf=None, priority=50):
        """ constructor to create a new instance"""
        self.url = url
        self.title = title
        self.description = description
        #this could potentially be enhanced by using Mark Pilgrim’s rss
        #autodiscovery tool
        self.rdf = rdf
        self.priority = priority
        #I’m using rdf to check for modified date since some servers
        #utilizing SSI or PHP does not return modified date
        self.modified_date = self.update_modified_date(self.rdf)
        
    def  update_modified_date(self, url=None):
        """gets the modified time of the blog. One may optionally specify
           another url to check too"""

        #see time module’s documentation. I’m storing modified_date
        #as seconds since epoch
        if url is None:
            url = self.url
        try:
            s = urllib.urlopen(url)
            message = s.info()
            try:
                dt = message.getdate('Last-modified')
                if dt is None:
                    dt = message.getdate('Last-Modified')
                self.modified_date = time.mktime(dt)
            except:
                return None
        except:
            #this is an indication of a broken link.
            #Exercise: handle this nicely!
            return None
        return self.modified_date

    def  show_html_string(self, modified_since=3600, current_time=None):
        """returns a string that can be used as an HTML link.
        
        If the blog has been modified within the specified
        number of seconds, print a * next to it too."""

        if current_time is None:
            current_time = time.mktime(time.localtime())
        s = """<a href="%s" title="%s">%s</a>""" % (self.url, 
                self.description,
                self.title)
        #v1.1 -> v1.2

        #This makes it kind of useless. If I can’t figure out the modified
        #date, then this is not going to work

        if  (self.modified_date is not None) and (current_time-self.modified_date < modified_since):
            s = s + "*"

        return s

#v1.1 -> v1.2 start
def  get_blog_option(config, section, option):
    """simple helper function to get the options"""
    if config.has_option(section, option):
        return config.get(section, option)
    else:
        return None
#v1.1 -> v1.2 end


#joy:I can specify what is the main part of the script by
#    by simply equating the predefined variable __name__ to
#    '__main__'. Exercise. Try printing __name__ within a function
#    Other scripts can reuse my class by just doing an
#    from thisfile import Blog
#
#    This feature can be used to define test routines for your
#    module.

if __name__ == '__main__':

    #v1.1 -> v1.2 start

    #joy:in python, you can easily import only the required
    #    part of the module
    #ConfigParser can parse files formatted like Windows .ini files
    from ConfigParser import ConfigParser
    config = ConfigParser()
    #blogroll.cfg is my configuration file

    config.readfp(open("blogroll.cfg"))
    #to read multiple configuration files, use
    #config.read([file1, file2...])
    #define a list of blogs that I visit often

    blogroll = []

    #joy:python’s lists are really too good. you can append()
    #    to a list. Or insert(). Or remove(). Check the documentation
    #    for more functions available for lists.

    for blog in config.sections():
        url = get_blog_option(config, blog, "url")
        rdf = get_blog_option(config, blog, "rdf")
        description = get_blog_option(config, blog, "description")
        priority = get_blog_option(config, blog, "priority")
        if priority is not None:
            priority = int(priority)
        blogroll.append(Blog(url, blog, description, rdf, priority))

    #joy: sort a list very easily by providing our own comparison function
    # lambda is a means to define inline functions. cmp() compares two
    # strings. I’m sorting my list by title


    blogroll.sort(lambda x,y: cmp(x.title, y.title))

    current_time = time.mktime(time.localtime())

    for blog in blogroll:
        print blog.show_html_string(3600, current_time), "<br/>"

# blogroll.cfg sample
[vsbabu.org]
url = http://vsbabu.org/mt
rdf = http://vsbabu.org/mt/index.xml
description = Gluing passing thoughts to foregone conclusions
priority = 50

[Simon Brunning]
url = http://www.brunningonline.net/simon/blog
rdf = http://www.brunningonline.net/simon/blog/index.xml
description = Small values of cool

Posted: August 27, 2002 07:20 AM
python

vsbabu.org

Related Entries

Joy of Python: Blogrolling 1.2