8

I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box.

url = "http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9" import urllib.request from bs4 import BeautifulSoup text = BeautifulSoup(urllib.request.urlopen(url).read()) print(text) 

Can anyone suggest a way of returning the bibtex citation as a string (or whatever) in python?

1

1 Answer 1

12

You don't need BeautifulSoup here. There is an additional XHR request sent to the server to fill out the bibtex citation, simulate it, for example, with requests:

import requests bibtex_id = '10.1007/s00425-007-0544-9' url = "http://www.doi2bib.org/#/doi/{id}".format(id=bibtex_id) xhr_url = 'http://www.doi2bib.org/doi2bib' with requests.Session() as session: session.get(url) response = session.get(xhr_url, params={'id': bibtex_id}) print(response.content) 

Prints:

@article{Burgert_2007, doi = {10.1007/s00425-007-0544-9}, url = {http://dx.doi.org/10.1007/s00425-007-0544-9}, year = 2007, month = {jun}, publisher = {Springer Science $\mathplus$ Business Media}, volume = {226}, number = {4}, pages = {981--987}, author = {Ingo Burgert and Michaela Eder and Notburga Gierlinger and Peter Fratzl}, title = {Tensile and compressive stresses in tracheids are induced by swelling based on geometrical constraints of the wood cell}, journal = {Planta} } 

You can also solve it with selenium. The key trick here is to use an Explicit Wait to wait for the citation to become visible:

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox() driver.get('http://www.doi2bib.org/#/doi/10.1007/s00425-007-0544-9') element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//pre[@ng-show="bib"]'))) print(element.text) driver.close() 

Prints the same as the above solution.

2
  • 1
    Thanks for that. Would you mind telling me how you can see the additional request was sent to doi2bib.org/doi2bib? Pretty new to this.
    – Nick
    CommentedFeb 3, 2015 at 1:27
  • 2
    @Nick sure, open browser developer tools->network tab. Go to the web-site and see all the requests sent to the server while the page is loaded. Among others you would see the one I've mentioned. Hope that helps.
    – alecxe
    CommentedFeb 3, 2015 at 1:28

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.