1

I have a big amount of HTML files which I want to process using BeautifulSoup and generate some statistics. Although, I came across the problem that the HTML files contain scripts that may generate more HTML code which is not being processed. Therefore, I need to render all Javascript into static HTML before proceeding.

I have seen some options such as using Selenium, but it doesn't seem to fit since I don't want to launch a browser (it should be done in background).

Can someone please suggest an appropriate approach to this?

Thanks in advance!

    1 Answer 1

    1

    Since you need a Javascript engine, using a headless browser is the way to go. Using Selenium web driver with the PhantomJS headless browser is probably your best option:

    driver = webdriver.PhantomJS() driver.get("...") bs = BeautifulSoup(driver.page_source) 

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.