0

I'm trying to scrape LinkedIn using selenium. Here's a page for example: https://www.linkedin.com/vsearch/p?firstName=mark

I can see in the html that the search results are in the:

<div id='results-col'> ... </div>

but when I try to access this tag using Beautifulsoup:

browser = webdriver.PhantomJS(executable_path=PATH) browser.get(url) bs_obj = BeautifulSoup(browser.page_source, "html.parser") results_col = bs_obj.find("div", {"id": "results-col"}) 

I get nothing(results_col=None). What am I doing wrong?

1
  • Add a sleep after the browser.get for the js to load
    – Tobey
    CommentedDec 14, 2016 at 19:32

1 Answer 1

2

Wait for the desired element to be present and only then get the page source:

from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # ... browser.get(url) wait = WebDriverWait(browser, 10) wait.until(EC.presence_of_element_located((By.ID, "results-col"))) bs_obj = BeautifulSoup(browser.page_source, "html.parser") 
3
  • I tried your code but I get: Traceback (most recent call last): File X, line 142, in <module> print(get_link_to_profile(search_url)) File X, line 121, in get_link_to_profile wait.until(EC.presence_of_element_located((By.ID, "results-col"))) File "C:\Users\sergeyy\AppData\Roaming\Python\Python35\site-packages\selenium\webdriver\support\wait.py", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: Screenshot: available via screenCommentedDec 14, 2016 at 20:15
  • @BobSacamano that could mean different things, but you don't have this element on the page opened with PhantomJS. Take a screenshot with take_screenshot() method after loading the page and see what is actually opened. You might need to start PhantomJS with some arguments to make it work: stackoverflow.com/questions/29463603/….
    – alecxe
    CommentedDec 14, 2016 at 21:12
  • @BobSacamano or, you may need to tweak the user agent to pretend to be a different browser: coderwall.com/p/9jgaeq/set-phantomjs-user-agent-string.
    – alecxe
    CommentedDec 14, 2016 at 21:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.