How to interact with a javascript webpage using Python?

Question

I am a complete newbie to web scraping; I have this small project of scraping some data from COCA but I don't even know where to start. It seems that this webpage is built using some Javascript and I wonder if there is some package that enables me to interact with it?

Here is some tasks that I want my program to do:

log in using one's account;
Choose a tab (e.g. search, chart, etc, please see COCA);
type in the word you want to search in the textbook;
scrape the search results.

Any suggestions would be greatly appreciated.

PS: Ideally everything should work at backstage (won't open the browser).

There also selenium which you can also use to execute js on websites. — Marcin, CommentedNov 10, 2016 at 0:56
@Marcin Thanks for the reply, yes I looked into selenium but I don't want my program to open the browser. Ideally everything works at backstage. Any suggestion? — Bayesric, CommentedNov 10, 2016 at 1:05
selenium can use phantomjs as headless browser (it means without displaying window). It can run Firefox/Chrome as headless browser too but it may need some work. — furas, CommentedNov 10, 2016 at 1:56
or you can "analyze" data send between browser and server (using DevTool in Chrome/Firefox) and then use this information to skip page rendering and running JavaScript - but this need more work and knowledge about HTTP. — furas, CommentedNov 10, 2016 at 2:01

Igor Savinkin · Accepted Answer · 2024-04-14 15:28:36Z

from pyvirtualdisplay import Display from selenium import webdriver display = Display(visible=0, size=(800, 600)) display.start() browser = webdriver.Firefox() browser.get('http://www.google.com') print browser.title browser.quit() display.stop()

pyvirtualdisplay in headless mode Display(visible=0) requires Xvbf, that is a feature of Linux. Read more here on Xvbf usage.

Note that using pyvirtualdisplay with visible=False requires Xvbf, and therefore this cannot be used on a Windows machine. — Niko Fohr, CommentedOct 17, 2017 at 6:19

Oswald · Accepted Answer · 2016-11-10 08:04:44Z

As some people have told you, you can use selenium. I recommend you to enter in the developers tools of your browser and follow the network requests that make the site, depending of the behavior of the page maybe you can do it with the python module request to simulate the request that you saw that was making the site, personally i think that it is simpler. If you can't emulate the request then use selenium.

Collectives™ on Stack Overflow

How to interact with a javascript webpage using Python?

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related