rbiorxiv

R client for interacting with the bioRxiv API

Installation

Install from CRAN:

# Install package install.packages("rbiorxiv") # Load package library(rbiorxiv)

Or install the development version from Github (using the devtools package):

# Install package install.packages("devtools") devtools::install_github("nicholasmfraser/rbiorxiv") # Load package library(rbiorxiv)

Usage

The main functions in rbiorxiv generally conform to the API endpoints outlined in the API documentation (see here).

Content detail

Retrieve details of either a set of preprints deposited between two dates, or lookup a single preprint by DOI:

# Get details of preprints deposited between 2018-01-01 and 2018-01-10# By default, only the first 100 records are returned biorxiv_content(from="2018-01-01", to="2018-01-10") # Set a limit to return more than 100 records biorxiv_content(from="2018-01-01", to="2018-01-10", limit=200) # Or set limit as "*" to return all records biorxiv_content(from="2018-01-01", to="2018-01-10", limit="*") # Skip the first 100 records biorxiv_content(from="2018-01-01", to="2018-01-10", limit=200, skip=100) # By default, data is returned in a list. Use the "format" argument to specify# that data should be returned in "json" format or as a data frame ("df"). biorxiv_content(from="2018-01-01", to="2018-01-10", format="df") # Lookup a preprint by DOI biorxiv_content(doi="10.1101/833400")

The bioRxiv API also allows querying of details of medRxiv preprints, by supplying a “server” parameter. This can be specified as follows:

# Get details of medRxiv preprints deposited between 2020-01-01 and 2020-01-02 biorxiv_content(server="medrxiv", from="2020-01-01", to="2020-01-02")

The default server parameter is always “biorxiv”. Note that the following functions documented below are limited to bioRxiv only (at the time of writing).

Published article detail

Retrieve details of published articles associated with bioRxiv preprints that were published between two dates:

# Get details of all articles published between 2018-01-01 and 2018-01-10 biorxiv_published(from="2018-01-01", to="2018-01-10", limit="*", format="df")

Publisher article detail

Retrieve details of articles published by a specific publisher (specified by their doi prefix) between two dates:

# Get details of all articles published by eLife (prefix = 10.7554) between 2018-01-01 and 2018-01-10 biorxiv_publisher(prefix="10.7554", from="2018-01-01", to="2018-01-10", limit="*", format="df")

Content summary statistics

Retrieve summary statistics for bioRxiv content (e.g. number of preprints deposited):

# Get summary statistics at a montly level biorxiv_summary(interval="m") # Get summary statistics at a yearly level biorxiv_summary(interval="y")

Usage summary statistics

Retrieve summary statistics for usage of bioRxiv content (e.g. number of pdf downloads):

# Get usage statistics at a montly level biorxiv_usage(interval="m") # Get usage statistics at a yearly level biorxiv_usage(interval="y")

API rate and usage limits

No rate or usage limits are currently specified for the bioRxiv API, however all functions in this package enforce a 1-second timeout per API call when iterating through multiple pages of results (a single API call currently returns a maximum of 100 results per page).

Examples

Growth of bioRxiv over time

library(tidyverse) # Plot the cumulative number of new preprints deposited per month# Note that month dates are returned in YYYY-MM format - here we convert# month dates to YYYY-MM-DD format to make plotting easier biorxiv_summary(interval="m", format="df") %>% mutate(month= as.Date(paste0(month, "-01"), format="%Y-%m-%d")) %>% ggplot() + geom_bar(aes(x=month, y=new_papers_cumulative), fill="#cccccc", stat="identity") + labs(x="", y="Submissions", title="Cumulative new bioRxiv submissions") + scale_x_date(date_breaks="3 months", date_minor_breaks="3 months", date_labels="%b-%y", expand= c(0,0)) + scale_y_continuous(labels=scales::comma) + theme_minimal() + theme( axis.text.x= element_text(angle=90, vjust=0.5), axis.title.y= element_text(margin= margin(t=0, r=10, b=0, l=0)), plot.title= element_text(face="bold") )

PDF downloads over time

library(tidyverse) # Plot the cumulative number of PDF downloads per month# Here month dates are returned already in YYYY-MM-DD format biorxiv_usage(interval="m", format="df") %>% mutate(month= as.Date(month)) %>% ggplot() + geom_bar(aes(x=month, y=pdf_cumulative), fill="#cccccc", stat="identity") + labs(x="", y="PDF downloads (cumulative)", title="Number of bioRxiv PDF downloads over time") + scale_x_date(date_breaks="3 months", date_minor_breaks="3 months", date_labels="%b-%y", expand= c(0,0)) + scale_y_continuous(labels=scales::comma) + theme_minimal() + theme( axis.text.x= element_text(angle=90, vjust=0.5), axis.title.y= element_text(margin= margin(t=0, r=10, b=0, l=0)), plot.title= element_text(face="bold") )

Time to publication

library(tidyverse) # Calculate the number of days between preprint deposition and # journal publication. Plot results as a histogram. biorxiv_published(from="2013-11-01", to="2018-12-31", limit="*", format="df") %>% mutate(days= as.Date(published_date) - as.Date(preprint_date)) %>% ggplot() + geom_histogram(aes(as.numeric(days)), binwidth=1, fill="#cccccc") + labs(x="Days between preprint deposition and journal publication", y="Number of articles", title="Time to publication") + coord_cartesian(xlim= c(-100, 1000)) + theme_minimal() + theme( axis.title.y= element_text(margin= margin(t=0, r=10, b=0, l=0)), plot.title= element_text(face="bold") )

Other tools/packages for working medRxiv/bioRxiv data

rbiorxiv aims to provide a simple wrapper around the main endpoints of the bioRxiv API, and return data for further analysis/manipulation by the R user. Below are some additional packages that provide distinct but related functionality when working with bioRxiv and medRxiv data:

medrxivr, developed by Luke McGuiness and part of the ROpenSci ecosystem, provides users with more powerful tools to download bioRxiv and medRxiv data, and search downloaded preprint records using regular expressions and Boolean login. medrxivr also allows users to export their search results to a .BIB file for easy import to a reference manager, and to download the full-text PDFs of preprints matching their search criteria.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
Readme.Rmd		Readme.Rmd
Readme.md		Readme.md
cran-comments.md		cran-comments.md
rbiorxiv.Rproj		rbiorxiv.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rbiorxiv

Installation

Usage

Content detail

Published article detail

Publisher article detail

Content summary statistics

Usage summary statistics

API rate and usage limits

Examples

Growth of bioRxiv over time

PDF downloads over time

Time to publication

Other tools/packages for working medRxiv/bioRxiv data

About

Releases 2

Packages

Languages

License

nicholasmfraser/rbiorxiv

Folders and files

Latest commit

History

Repository files navigation

rbiorxiv

Installation

Usage

Content detail

Published article detail

Publisher article detail

Content summary statistics

Usage summary statistics

API rate and usage limits

Examples

Growth of bioRxiv over time

PDF downloads over time

Time to publication

Other tools/packages for working medRxiv/bioRxiv data

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages