Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Button maker
» bCentral screen shot

urllib2 - setting http headers

Small sample script illustrating using urllib2 to add header information.

Came across this script I used a while ago to test something. Thought it might be a useful snippet - so here it is. I think I used this to extract data from a packaged application - it had two different interfaces for IE and Mozilla. I had to get the data out, parse it and massage it - not a nice way to do stuff :-(

"""
Spoof browser agent - also illustrates how to set other headers
Note that the browser agent string will have Python-urllib prefixed
to whatever you specify.
"""
import urllib2

url = 'http://vsbabu.org/'

txdata = None
txheaders = {   
    'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
    'Accept-Language': 'en-us',
    'Accept-Encoding': 'gzip, deflate, compress;q=0.9',
    'Keep-Alive': '300',
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
}
req = urllib2.Request(url, txdata, txheaders)
u = urllib2.urlopen(req)
headers = u.info()
data = u.read()
  1. the problem with the prefixed urrlib string is this:

    actually 2 useragent headers are sent. header keys are unique, but case sensitive. if you specify 'User-agent' (lower case A!!) only one header will be sent. (the spoofed one).

    nice weblog!

    Posted by: jeewee on November 12, 2003 06:14 AM
  2. As it was stated, actually two agents are sent. Apache and PHP will read, grab and pass both of them, but Zope will take only the first one, no matter what. (Python 2.1.3).

    So if you really want to spoof the agent and sent only one, follow these instructions:

    1.Set txheaders as you wish, but don't set the User-Agent, (neither 'User-agent'), at all.

    2. Create the request, as already explained:
    req = urllib2.Request(url, txdata, txheaders)

    3.Now don't use urlopen. Instead, create an opener:
    opener = urllib2.build_opener()

    4.Add the user agent to the opener:
    opener.addheaders = [('User-agent', 'My Spoofed Agent')]

    To retrieve the data:

    data = opener.open(req).read()


    That's it. 'opener.addheaders' is not meant to be used like that, so this is a hack, but it works real nice.


    Posted by: ZopeUser on May 13, 2004 01:57 PM
//-->