2

Does Python3 have a JavaScript based scraping library that is not Selenium? I'm trying to scrape https://www.mailinator.com/v2/inbox.jsp?zone=public&query=test, but the inbox is loaded with JavaScript. The reason I don't want to use Selenium is I don't want it to open a window when I run it.

Here is my non-working code:

import requests from bs4 import BeautifulSoup as soup INBOX = "https://www.mailinator.com/v2/inbox.jsp?zone=public&query={}" def check_inbox(name): stuff = soup(requests.get(INBOX.format(name)).text,"html.parser") print(stuff.find("ul",{"class":"single_mail-body"})) check_inbox("retep") 

Do any such libraries exist?

I couldn't find anything for the Google search python 3 javascript scraper outside of Selenium.

8
  • Possible duplicate of Web-scraping JavaScript page with Python
    – Hum4n01d
    CommentedOct 23, 2017 at 21:58
  • @Hum4n01d this is python3, not python.
    – Peter S
    CommentedOct 23, 2017 at 21:59
  • I don't see why that would make a difference.
    – Hum4n01d
    CommentedOct 23, 2017 at 22:00
  • different syntax, libraries aren't compatible
    – Peter S
    CommentedOct 23, 2017 at 22:00
  • Ok, but overall the solution is still going to be the same. You need a library that renders the page with JavaScript before you start scraping.
    – Hum4n01d
    CommentedOct 23, 2017 at 22:02

1 Answer 1

1

You don't need javascript actually, because it's client side, so you can emulate it.

If you inspect the webpage (developer tools > network), you'll see that there is a websocket connection to this :

wss://www.mailinator.com/ws/fetchinbox?zone=public&query=test 

Webpage inspection

Now if you implement a websocket client using python, you'll be able to cleanly fetch your mails (see this : https://github.com/aaugustin/websockets/blob/master/example/client.py).

EDIT :

As mentioned by John, augustin's ws client repo is dead. Today I'd use this : https://websockets.readthedocs.io/en/stable/

7
  • hmm... it's not working for me - websockets.exceptions.InvalidStatusCode: Status code not 101: 500
    – Peter S
    CommentedOct 23, 2017 at 22:17
  • import websockets, asyncio from bs4 import BeautifulSoup as soup INBOX = "wss://www.mailinator.com/ws/fetchinbox?zone=public&query=test" async def hello(): async with websockets.connect(INBOX) as ws: response = await ws.recv() print(response) asyncio.get_event_loop().run_until_complete(hello())
    – Peter S
    CommentedOct 23, 2017 at 22:18
  • that's an internal server error. But it's another topic. I'd suggest you make another question with what you are trying, and why it doesn't work. My guess is, you have to send some headers (maybe the cookies). Also you should look at the source code, and see how they do their websocket connection. Maybe you have to register to a channel. Also, try a bit more than 5 minutes before asking the community ;)
    – Loïc
    CommentedOct 23, 2017 at 22:18
  • 1
    I noticed that one of the headers is changing every time - Sec-WebSocket-Key. How would I go about generating one of these?
    – Peter S
    CommentedOct 23, 2017 at 22:28
  • 1
    @Loïc the github link is broken, do you have a code snippet you could add to your answer?
    – Coder
    CommentedNov 28, 2021 at 8:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.