IE automation

This cruel world can make you use IE at times. Short function that uses IE to download a page.

The fact that I had to write this shows how bad the state of software UI design is :-) Anyway, if anybody else is in a screwed up situation of having to automate IE, this might help you.

def download_url_with_ie(url):
    Given a url, it starts IE, loads the page, gets the HTML.

    Works only in Win32 with Python Win32com extensions enabled.
    Needs IE.
    Why? If you’re forced to work with Brain-dead closed source
    applications that go to tremendous length to deliver output
    specific to browsers; and the application has no interface

    other than a browser; and you want get data into a CSV or
    XML for further analysis;
    Note: IE internally formats all HTML to stupid mixed-case, no-
    quotes-around-attributes syntax. So if you are planning to parse
    the data, make sure you study the output of this function rather

    than looking at View-source alone.
    #if you are calling this function in a loop, it is more
    #efficient to open ie once at the beginning, outside this
    #function and then use the same instance to go to url’s
    from win32com.client import Dispatch
    from time import sleep

    ie = Dispatch("InternetExplorer.Application")

    ie.Visible = 1 #make this 0, if you want to hide IE window
    #IE started

    #it takes a little while for page to load
    if ie.Busy:

    #now, we got the page loaded and DOM is filled up
    #so get the text
    text = ie.Document.body.innerHTML
    #text is in unicode, so get it into a string

    text = unicode(text)
    text = text.encode('ascii','ignore')

    #save some memory by quitting IE! **very important** :-)
    return text
