Scraping javascript with Python and Selenium Webdriver

Question

I'm trying to scrape the ads from Ask, which are generated in an iframe by a JS hosted by Google.

When I manually navigate my way through, and view source, there they are (I'm specifically looking for a div with the id "adBlock", which is in an iframe).

But when I try using Firefox, Chromedriver or FirefoxPortable, the source returned to me is missing all of the elements I'm looking for.

I tried scraping with urllib2 and had the same results, even when adding in the necessary headers. I thought for sure that a physical browser instance like Webdriver creates would have fixed that problem.

Here's the code I'm working off of, which had to be cobbled together from a few different sources:

from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import pprint # Create a new instance of the Firefox driver driver = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe') driver.get("http://www.ask.com") print driver.title inputElement = driver.find_element_by_name("q") # type in the search inputElement.send_keys("baseball hats") # submit the form (although google automatically searches now without submitting) inputElement.submit() try: WebDriverWait(driver, 10).until(EC.title_contains("baseball")) print driver.title output = driver.page_source print(output) finally: driver.quit()

I know I circle through a few different attempts at viewing the source, that's not what I'm concerned about.

Any thoughts as to why I'm getting one result from this script (ads omitted) and a totally different result (ads present) from the browser it opened in? I've tried Scrapy, Selenium, Urllib2, etc. No joy.

Richard · Accepted Answer · 2014-01-30 02:38:11Z

Selenium only displays the contents of the current frame or iframe. You'll have to switch into the iframes using something along these lines

iframes = driver.find_elements_by_tag_name("iframe") for iframe in iframes driver.switch_to_default_content() driver.switch_to_frame(iframe) output = driver.page_source print(output)

You're a mad scientist. Worked like a charm, thank you.
– Rob M
CommentedJan 30, 2014 at 16:41 — Rob M, CommentedJan 30, 2014 at 16:41
Indeed, you are maaad! Works perfectly!
– tmthyjames
CommentedFeb 20, 2015 at 20:21 — tmthyjames, CommentedFeb 20, 2015 at 20:21

Collectives™ on Stack Overflow

Scraping javascript with Python and Selenium Webdriver

1 Answer 1

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Linked

Related