India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML
« Grasso Quits
» Post Isabel edition
Every other week, I find myself facing a job requiring regular expressions. And every other week, I need to refer to python re module and regex howto.
In an effort to reduce that time, here is working code that -- at the moment -- parses National Geographic News and IBM dW home page. Perhaps it might be useful to newbies too. The functions return a list of tuples like (title, url, description, date, category)
If you like to generate RSS from this, checkout Python RSS2Gen module.
It is slowly getting difficult to remember what I read in the documentation; especially after reading documentation on different technologies all the time. I think writing code snippets and templates for ready reference is a better way to keep things in memory a little longer.
I wrote similar code a while ago, to gather URLs from various (computer-related) news sites. The resulting program, called Mygale, can be found here: http://www.awaretek.com/nowak/mygale.html
Most of it was written in late 2001. I cannot guarantee that the code still works on recent versions of the sites.
Well I was wondering if you guys could show a bit of python power. I need to extract html tables from about 200 web pages. I dont know python, but wanted to use it for this task so I could learn. Any tips, or pointers will be very helpful.
Thanks
Geoff, if the HTML pages are well formed, you can use SGMLLib module for easy parsing. See diveintopython.org, section on HTML processing. If they are not, you might want to pass them through HTML Tidy to make them well formed. Otherwise, like in the example code above, you could use regular expressions.
Once you can parse one file properly, just make it into a function and call it from a loop like 'for file in [file1, file2...]' - see python.org tutorial. Python is very easy to start and to move along.
I love your site!!!
link to working code is broken :-( sniff I'd like a peek at it