Related Entries

India PyCon 2009
Quick wallpaper changer
Load testing with Grinder
Adding namespace to XML
Opera RSS to OPML

« Interesting software from Freshmeat
» Uniform civil code

IE automation

This cruel world can make you use IE at times. Short function that uses IE to download a page.

The fact that I had to write this shows how bad the state of software UI design is :-) Anyway, if anybody else is in a screwed up situation of having to automate IE, this might help you.


def download_url_with_ie(url):
    """
    Given a url, it starts IE, loads the page, gets the HTML.
    

    Works only in Win32 with Python Win32com extensions enabled.
    Needs IE.
    
    Why? If you’re forced to work with Brain-dead closed source
    applications that go to tremendous length to deliver output
    specific to browsers; and the application has no interface

    other than a browser; and you want get data into a CSV or
    XML for further analysis;
    
    Note: IE internally formats all HTML to stupid mixed-case, no-
    quotes-around-attributes syntax. So if you are planning to parse
    the data, make sure you study the output of this function rather

    than looking at View-source alone.
    """
    
    #if you are calling this function in a loop, it is more
    #efficient to open ie once at the beginning, outside this
    #function and then use the same instance to go to url’s
    from win32com.client import Dispatch
    from time import sleep

    ie = Dispatch("InternetExplorer.Application")

    ie.Visible = 1 #make this 0, if you want to hide IE window
    #IE started
    
    ie.Navigate(url)

    #it takes a little while for page to load
    if ie.Busy:
        sleep(2)

    #now, we got the page loaded and DOM is filled up
    #so get the text
    text = ie.Document.body.innerHTML
    #text is in unicode, so get it into a string

    text = unicode(text)
    text = text.encode('ascii','ignore')

    #save some memory by quitting IE! **very important** :-)
    ie.Quit()
    return text
  1. Is there any way to do this with urllib? Or does it have to be by automating IE to get the IE-specific formatting for the page?

    Posted by: Tom on June 13, 2003 01:46 PM
  2. I couldn't figure out how to do that with urllib or urllib2. Even if the header is set to mimic IE, urllib adds a prefix of "Python..." - I guess the logic in t
    he app is to see if browser agent string matches IE's exactly.

    Posted by: Babu on June 13, 2003 02:43 PM
  3. Do you have any idea how to post form data using IE?

    Posted by: chris on August 20, 2003 09:12 PM
  4. This is the error I get when trying your recipe, any workaround?

    File "D:\doc\getsites\g2.py", line 115, in download_url_with_ie
    ie.Navigate(url)
    File "", line 2, in Navigate
    pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, None, None,
    0, -2147467259), None)

    Posted by: Will Stuyvesant on August 22, 2003 04:12 PM
  5. Do you know of a good reference to know of the com capabiblity of IE?

    Posted by: Andrew Louder on November 11, 2003 04:26 PM
  6. WebBrowser control is documented on MSDN at http://msdn.microsoft.com/workshop/browser/prog_browser_node_entry.asp

    Internet Explorer is at http://msdn.microsoft.com/workshop/browser/webbrowser/reference/objects/internetexplorer.asp

    Posted by: Babu on November 11, 2003 05:33 PM
  7. hi Babu,

    I am new to jython. I need some help in automating the IE through Jython. I was going through your website which is pretty cool, i found the following piece of code.
    My requirement is just to close the IE window through Jython. I can see the statement "ie.Quit()" doing that. But what is that win32com.client. I would really appreciate your prompt response.


    Posted by: Haroon on May 27, 2004 12:13 PM
  8. #7 - Native python has a wrapper around Win32 libraries. With that we can use ActiveX components similar to how we use them from Microsoft scripting environment. However, Jython is Python implemented on top of Java. If you can figure out how to access ActiveX components from Java (that would be JNI, right?), you may be able to access it through Jython. I am not very sure though.

    Posted by: Babu on May 27, 2004 05:30 PM
  9. Hi Babu,
    While I try to print the html file using Acrobat Distiller to convert to postscript it prompts for filename dialog to enter the .
    Is there a way to suppress the dialog and set the output postscript file programatically. Also is there a way to activate the printer to Acrobat distiller if that is not the default printer on which I run my script.
    I am new to IE objects.
    Right now I tried using
    .ExecWB 6, -1
    Please help
    Thanks
    Abhi.

    Posted by: Abhijeet on July 17, 2004 01:44 AM
//-->