0

I am learning web scrapping since I need it for my work. I wrote the following code:

from selenium import webdriver chromedriver='/home/es/drivers/chromedriver' driver = webdriver.Chrome(chromedriver) driver.implicitly_wait(30) driver.get('http://crdd.osdd.net/raghava/hemolytik/submitkey_browse.php?ran=1955') df = pd.read_html(driver.find_element_by_id("table.example.display.datatable").get_attribute('example'))[0] 

However, it is showing the following error:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="table.example.display.datatable"]"} (Session info: chrome=103.0.5060.134) 

Then I inspect the table that I wanna scrape this table from this pageenter image description here

what is the attribute that needs to be included in get_attribute() function in the following line?

df = pd.read_html(driver.find_element_by_id("table.example.display.datatable").get_attribute('example'))[0] 

what I should write in the driver.find_element_by_id?

EDITED: Some tables have lots of records in multi-pages. For example, this page has 2,246 entries, which shows 100 entries on each page. Once I tried to web-scrape it, there were only 320 entries in df and the record ID is from 1232-1713, which means it took entries from the next few pages and it is not starting from the first page to the end at the last page.

What we can do in such cases?

    2 Answers 2

    1

    You need to get the outerHTML property of the table first, then call the table element from pandas.

    You need to wait for element to be visible. Use explicit wait like WebdriverWait()

    driver.get('http://crdd.osdd.net/raghava/hemolytik/submitkey_browse.php?ran=1955') table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#example"))) tableRows=table.get_attribute("outerHTML") df = pd.read_html(tableRows)[0] print(df) 

    Import below libraries.

    from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By import pandas as pd 

    Output:

     ID PMID YEAR ... DSSP Natural Structure Final Structure 0 1643 16137634 2005 ... CCCCCCCCCCCSCCCC NaN NaN 1 1644 16137634 2005 ... CCTTSCCSSCCCC NaN NaN 2 1645 16137634 2005 ... CTTTCGGGHHHHHHHHCC NaN NaN 3 1646 16137634 2005 ... CGGGTTTHHHHHHHGGGC NaN NaN 4 1647 16137634 2005 ... CCSCCCSSCHHHHHHHHHTTC NaN NaN 5 1910 16730859 2006 ... CCCCCCCSSCCSHHHHHHHHTTHHHHHHHHSSCCC NaN NaN 6 1911 16730859 2006 ... CCSCC NaN NaN 7 1912 16730859 2006 ... CCSSSCSCC NaN NaN 8 1913 16730859 2006 ... CCCSSCCSSCCSHHHHHTTHHHHTTTCSCC NaN NaN 9 1914 16730859 2006 ... CCSHHHHHHHHHHHHHCCCC NaN NaN 10 2110 11226440 2001 ... CCCSSCCCBTTBTSSSSSSCSCC NaN NaN 11 3799 9204560 1997 ... CCSSCC NaN NaN 12 4149 16137634 2005 ... CCHHHHHHHHHHHC NaN NaN [13 rows x 17 columns] 
    5
    • 1
      Thank you very much, the shortest and happiest way of doing the work :)
      – S.EB
      CommentedSep 21, 2022 at 9:11
    • what we can do if there is no id attribute for a specific page? for example when inspecting, we have this <table summary="The result of Se", class="datatable"><tbody>...
      – S.EB
      CommentedSep 21, 2022 at 9:45
    • Use the class attribute. your css selector would be table.datatable
      – KunduK
      CommentedSep 21, 2022 at 9:57
    • what if the table is long and is multi-pages? what I should look for inspection? and what changes should be done? I edited the question and put the sample link. Thanks
      – S.EB
      CommentedSep 22, 2022 at 1:32
    • SO is used for research purpose. I would appreciate if you post a new question with your new requirements. Thanks.
      – KunduK
      CommentedSep 22, 2022 at 8:34
    0

    If you want to select table by @id you need

    driver.find_element_by_id("example") 

    By.CSS:

    driver.find_element_by_css_selector("table#example") 

    By.XPATH:

    driver.find_element_by_xpath("//table[@id='example']) 

    If you want to extract @id value you need

    .get_attribute('id') 

    Since there is not much sense in searching by @id to extract that exact @id you might use other attribute of table node:

    driver.find_element_by_xpath("//table[@aria-describedby='example_info']").get_attribute('id') 
    2
    • Thanks for your answer. But once I am trying to run the df = pd.read_html(driver.find_element_by_xpath("//table[@aria-describedby='example_info']").get_attribute('id'))[0], it raises an error, ValueError: No tables found.
      – S.EB
      CommentedSep 20, 2022 at 9:44
    • @S.EB oops, sorry. I missed closing double-quotes. Try now
      – JaSON
      CommentedSep 20, 2022 at 9:50

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.