Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Asterix and Mickey
» Checky!

PyBlagg gets timezones

Added Timezone support and removed dependency on ls -t.

I’m half asleep, so keep that in mind before using this code.

My home-grown aggregator, PyBlagg (CVS), now parses time zone information too. As much as possible, it tries to convert the news items time to your machine’s local time.

Other changes include:

Instead of ls -t it now uses Python’s glob module with internal sorting. It should now work wherever Python runs. I’m using this on Windows 98 now.

I’ve also added timeoutsocket.py into CVS. Drop this in the same folder. If you run aggregator from a scheduler, I strongly suggest using this. With this, when the next web server worm comes out, your scheduled jobs won’t hang.

I cleared my data directory before installing the new version. Naturally, this causes problem for some days because date-less RSS feeds (eg: Sanjay’s) suddenly start coming up at the top. When feed doesn’t have dates, it assumes saved copy’s date or current date if there is no saved copy. So, strictly speaking, you shouldn’t clear data from date-less feeds.

I took out Mark Paschal from the subscription list - though I check his blog daily. He has cleverly managed to put in a "latest news item" asking people to implement conditional HTTP GETs. I’m too sleepy to implement it now. Also, PyBlagg doesn’t store RSS files locally at the moment. Once implemented, thus honoring Mark’s request, I plan to add his feed back to my subscription list.

I’ve some planned updates to this aggregator, now that atleast two others are using it. Remember, this started out dirty and then became a bit less dirty.

  1. Yay, you rock!

    It may still be a little dirty, but it's incredibly useful :)

    Posted by: Richard Jones on February 2, 2003 12:24 AM
  2. Hmmmm... what would be the best way for me to update mine since you don't have HTML::Template support yet? I like how my output looks at the moment. :)

    Looking forward to the updates!

    Posted by: eliot on February 2, 2003 03:20 PM
  3. The code has changed only in few places. To upgrade, make the following changes to pyblagg.py from CVS:
    (a) Add the function current_tz_offset_from_utc().
    (b) For the function parsetime(), replace the old code with the new code.
    (c) In main, add a line at the top "offset_from_utc = current_tz_offset_from_utc()" an else: condition with reference to offset_from_utc. See CVS, it is only in two places.
    (d) The above takes care of timezone support. Now delete all the data files that correspond to RSS2 or RDF (those that have time info).
    (e) If you want to use glob support, in main, replace the line for "aggritems=" right after "getting channel data". with the two corresponding lines from CVS.

    I didn't realize someone will actually use this script. Since that is not the case now, I need to make it packaged a bit nicely :-)

    Posted by: Babu on February 2, 2003 03:53 PM
  4. yay.. all upgraded. i made those changes you mentioned and forgot about the new IMPORT lines at the top. that's a no brainer that I, of course, forgot. :)

    thanks for the updates! i love it!

    Posted by: eliot on February 2, 2003 06:43 PM
  5. The new code doesn't cope with YY-MM-DDTHH:MM:SS without a timezone offset (which is sorta common). Simple to fix.

    Also, I still have to hack in the HTML-stripping and truncation code (which is also simple).

    Hey, what about getting this set up on sf.net, then we can all hack the code :)

    Posted by: Richard Jones on February 2, 2003 07:31 PM
  6. I've just modified the rfc822 date parsing (the form used in RSS2) to use a regular expression - it barfed when given a single-digit day.

    Posted by: Richard Jones on February 2, 2003 08:23 PM
  7. pyblagg gets crazy on zopezen.org dc:dates

    the code doesn't cope with date formats like
    YYYY-MM-DDThh:mmTZD
    (eg 1997-07-16T19:20+01:00)

    Posted by: michael on June 29, 2003 04:52 PM
//-->