RoboBrowser: Your friendly neighborhood web scraper

https://badge.fury.io/py/robobrowser.pnghttps://travis-ci.org/jmcarp/robobrowser.png?branch=masterhttps://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master

Homepage: http://robobrowser.readthedocs.org/

RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services that don’t have APIs, RoboBrowser can help.

importrefromrobobrowserimportRoboBrowser# Browse to Rap Geniusbrowser=RoboBrowser(history=True)browser.open('http://rapgenius.com/')# Search for Queenform=browser.get_form(action='/search')form# <RoboForm q=>form['q'].value='queen'browser.submit_form(form)# Look up the first songsongs=browser.select('.song_name')browser.follow_link(songs[0])lyrics=browser.select('.lyrics')lyrics[0].text# \n[Intro]\nIs this the real life...# Back to results pagebrowser.back()# Look up my favorite songbrowser.follow_link('death on two legs')# Can also search HTML using regex patternslyrics=browser.find(class_=re.compile(r'\blyrics\b'))lyrics.text# \n[Verse 1]\nYou suck my blood like a leech...

RoboBrowser combines the best of two excellent Python libraries: Requests and BeautifulSoup. RoboBrowser represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:

importrefromrobobrowserimportRoboBrowserbrowser=RoboBrowser(user_agent='a python robot')browser.open('https://github.com/')# Inspect the browser sessionbrowser.session.cookies['_gh_sess']# BAh7Bzo...browser.session.headers['User-Agent']# a python robot# Search the parsed HTMLbrowser.select('div.teaser-icon')# [<div class="teaser-icon"># <span class="mega-octicon octicon-checklist"></span># </div>,# ...browser.find(class_=re.compile(r'column',re.I))# <div class="one-third column"># <div class="teaser-icon"># <span class="mega-octicon octicon-checklist"></span># ...

RoboBrowser also includes tools for working with forms, inspired by WebTest and Mechanize.

fromrobobrowserimportRoboBrowserbrowser=RoboBrowser()browser.open('http://twitter.com')# Get the signup formsignup_form=browser.get_form(class_='signup')signup_form# <RoboForm user[name]=, user[email]=, ...# Inspect its valuessignup_form['authenticity_token'].value# 6d03597 ...# Fill it outsignup_form['user[name]'].value='python-robot'signup_form['user[user_password]'].value='secret'# Serialize it to JSONsignup_form.serialize()# {'data': {'authenticity_token': '6d03597...',# 'context': '',# 'user[email]': '',# 'user[name]': 'python-robot',# 'user[user_password]': ''}}# And submitbrowser.submit_form(signup_form)

Requirements

  • Python >= 2.6 or >= 3.3

License

MIT licensed. See the bundled LICENSE file for more details.

Fork me on GitHub
close