get the page source using selenium webdriver python

Question

Im scrapping some website and and its working dynamically. Im going to all the pages in a website and meanwhile i want all the page source data of all pages in a list. This is my code move to all the pages and get their page source. But nothing is printing or returning at the end of the function. I did this for other website its worked, but not here. Please help me out of this. Thank you

def get_html(driver): output = [] keep_going = True while keep_going: # Pull page HTML try: output.append(driver.page_source) except TimeoutException: pass try: # Check to see if a "next page" link exists keep_going = driver.find_element_by_class_name( 'next ').is_displayed() except NoSuchElementException: keep_going = False if keep_going == True: try: driver.wait.until(EC.element_to_be_clickable( (By.CLASS_NAME, 'next '))).click() time.sleep(3) except TimeoutException: keep_going = False else: keep_going = False print(str(len(output))) return (output) raw_data = get_html(driver) print(str(len(raw_data)) listing found")

This is the error output im getting.

> Entering search term number 1 out of 1 Traceback (most recent call > last): File "E:/Harshitha/python learning/python/New/rough1.py", > line 114, in <module> > raw_data = get_html(driver) File "E:/Harshitha/python learning/python/New/rough1.py", line 65, in get_html > output = (driver.page_source).encode('utf-8') File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", > line 670, in page_source > return self.execute(Command.GET_PAGE_SOURCE)['value'] File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", > line 312, in execute > self.error_handler.check_response(response) File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", > line 237, in check_response > raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: chrome not > reachable (Session info: chrome=63.0.3239.132) (Driver info: > chromedriver=2.34.522940 > (1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT > 10.0.16299 x86_64)

possible duplicate of this
– abybaddi009
CommentedFeb 5, 2018 at 11:42 — abybaddi009, CommentedFeb 5, 2018 at 11:42

j.barrio · Accepted Answer · 2018-02-06 10:16:52Z

1

I use the page_sourcefunction:

driver.page_source;

edited Feb 6, 2018 at 10:16

answered Feb 6, 2018 at 8:14

j.barrio

1,0581 gold badge15 silver badges38 bronze badges

Its a python code and its throwing an error as " 'WebDriver' object has no attribute 'get_source' "
– user9270684
CommentedFeb 6, 2018 at 8:52
Try driver.get_source with out ()
– j.barrio
CommentedFeb 6, 2018 at 8:53
I change the function on python is page_source
– j.barrio
CommentedFeb 6, 2018 at 10:17
Even i used page_source only, but the result is not returning from the function. I used this for 5-6 websites it worked for some 3 websites but not others, i dont know why.....
– user9270684
CommentedFeb 7, 2018 at 4:13
Maybe this encode to utf-8 is the problem output = (driver.page_source).encode('utf-8') . could you try yo encode with unicode?
– j.barrio
CommentedFeb 7, 2018 at 8:40

Add a comment |

Collectives™ on Stack Overflow

get the page source using selenium webdriver python

1 Answer 1

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Linked

Related