0

I am scraping web pages using python-scrapy which works pretty well for static content. I am trying to scrape a url from this page but as it turns out, it is returned through a javascript call. For this I am using selenium but unable to figure out how to do it.

If you click on the "size chart" on the given link, you see a pop up opening mentioning the size guide. How can I get the url of this guide in my program?

I am also facing a similar problem on koovs as well getting the size guide. If anyone could guide on any of the links, I'd be really grateful.

    1 Answer 1

    1

    Locate the "size chart" link by link text, click it and extract the data, example:

    from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox() driver.get('http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3') wait = WebDriverWait(driver, 10) chart = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "size chart"))) chart.click() for title in driver.find_elements_by_css_selector("div.size-chart-body div.size-chart table th"): print title.text driver.close() 

    Prints (table header row, for the sake of an example):

    Indian Size Euro Size Garment Bust (In.) Garment Waist (in.) Garment Hip (in.): 

    Note that you don't need selenium to get the size chart data, it is already inside the DOM, but invisible until you click "size chart". You can reach the same size chart table with Scrapy. Demo from the "Scrapy Shell":

    $ scrapy shell http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3 In [1]: for title in response.css("div.size-chart-body div.size-chart table th")[1:]: print title.xpath("text()").extract()[0] ...: Indian Size Euro Size Garment Bust (In.) Garment Waist (in.) Garment Hip (in.) 

    In case of Koovs, you can still avoid using selenium and construct the size chart URL manually extracting the category and deal name, e.g.:

    $ scrapy shell http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651 In [1]: category = response.xpath("//input[@id='master_category_name_id_ref']/@value").extract()[0] In [2]: deal = response.xpath("//input[@id='deal_id']/@value").extract()[0] In [3]: "http://www.koovs.com/koovs/sizechart/women/{category}/{deal}".format(category=category, deal=deal) Out[3]: 'http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554' 

    And, if you still want to go with selenium, here you are:

    from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox() driver.get('http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651&skuid=236376') wait = WebDriverWait(driver, 10) chart = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[size_chart]"))) chart.click() driver.switch_to.window(driver.window_handles[-1]) print driver.current_url driver.close() 

    Prints:

    http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554 
    5
    • Both ways work and thanks for pointing out the unnecessary use of selenium. I just want to get the URL of the size chart rather than the text present on it. Is there a way to do that?CommentedMay 27, 2015 at 13:44
    • 1
      @PraveshJain from what I see, it is just embedded into the page. There is no URL to the size chart.
      – alecxe
      CommentedMay 27, 2015 at 14:17
    • Ok that seems true for the jabong link but for the koovs page there is a link to it. Any ideas on how to get it programmatically?CommentedMay 27, 2015 at 14:23
    • 1
      @PraveshJain please see the update, and let me know if you are still interested on how to solve it with selenium.
      – alecxe
      CommentedMay 27, 2015 at 14:33
    • Thanks for the update. It works fine and leaves no need to use selenium. I am still curious as to what would we do if the url couldn't be formed using categories and deals. So if you have figured out a way to do it generally too, knowing it would be really helpful.CommentedMay 27, 2015 at 14:48

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.