0

I am new to web crawling and I am trying to write a simple script to get course names from a University course catalog table:

from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary binary = FirefoxBinary(r'C:\Program Files\Mozilla Firefox\firefox.exe') driver = webdriver.Firefox(firefox_binary=binary) url = 'https://courses.illinois.edu/schedule/2018/fall/CS' driver.get(url) course_names = [] for i in range(1, 69): if(float(i)%2 != 0): #odd row number curr_name = driver.find_element_by_css_selector('tr.odd:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text else: curr_name = driver.find_element_by_css_selector('tr.even:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text course_names.append(curr_name) print(course_names) driver.quit() 

When I run this I get the following error:

InvalidSelectorException: Message: Given css selector expression "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)" is invalid: InvalidSelectorError: 'tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)' is not a valid selector: "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)" 

I am completely lost on how to get around this. I am just trying to get it to go through the table. It just does not seem to like i. I know this works:

tr.odd:nth-child(1) > td:nth-child(2) > a:nth-child(1) tr.even:nth-child(2) > td:nth-child(2) > a:nth-child(1) tr.odd:nth-child(3) > td:nth-child(2) > a:nth-child(1) 

Any suggestions?

4
  • not exeperienced with selenium but for me i is inside the string used as the selector and it's not the variable i defined outside which is wrong .... i think you should have something like 'nth-child('+i+')'CommentedMar 25, 2018 at 15:12
  • It seems your css selectors are incorrect. Did you evaluate those?CommentedMar 25, 2018 at 15:17
  • I tried both suggested replacements for i but I am still getting the same error. Any other tips?
    – Lily
    CommentedMar 25, 2018 at 16:58
  • nvm, 'nth-child('+str(i)+')' eventually worked :)
    – Lily
    CommentedMar 25, 2018 at 17:47

1 Answer 1

2

There are multiple issues with your code:

  • i is used as a character in your selector. Replace with nth-child(" + str(i) + ")

  • you are filtering the odd and even rows in your script and in the selector. Choose one, not both.

  • locating elements and reading the text in a loop is expensive. Scraping the text directly with some JavaScript would be a better approach.

rows = driver.execute_script(""" return [].map.call(document.querySelectorAll('#default-dt tbody tr'), row => [ row.cells[0].innerText, /* Course number */ row.cells[1].innerText, /* Course title */ row.querySelector('[href]').href /* Course link */ ]); """) for code, title, href in rows: print(code, title, href) 
4
  • I tried making the replacement you suggested but it still is giving the same error
    – Lily
    CommentedMar 25, 2018 at 16:42
  • Try the script in the console of your browser (F12) to figure out the issue.CommentedMar 25, 2018 at 16:49
  • how is that going to show anything different? I am running script in jupyter notebook. can browser console run python scripts?
    – Lily
    CommentedMar 25, 2018 at 16:56
  • print all the selectors from python and try them in the browser's console with document.querySelector('tr.odd:nth-child(1) > td:nth-child(2) > a:nth-child(1)')CommentedMar 25, 2018 at 17:55

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.