- Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathReadme.Rmd
250 lines (166 loc) · 7.64 KB
/
Readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
output: rmarkdown::github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
fig.path = "man/figures/"
)
```
# rbiorxiv
R client for interacting with the [bioRxiv API](https://api.biorxiv.org)
## Installation
Install from CRAN:
```{r eval=FALSE}
# Install package
install.packages("rbiorxiv")
# Load package
library(rbiorxiv)
```
Or install the development version from Github (using the [devtools](https://CRAN.R-project.org/package=devtools) package):
```{r eval=FALSE}
# Install package
install.packages("devtools")
devtools::install_github("nicholasmfraser/rbiorxiv")
# Load package
library(rbiorxiv)
```
## Usage
The main functions in `rbiorxiv` generally conform to the API endpoints outlined in the API documentation ([see here](https://api.biorxiv.org/)).
### Content detail
Retrieve details of either a set of preprints deposited between two dates, or lookup a single preprint by DOI:
```{r eval=FALSE}
# Get details of preprints deposited between 2018-01-01 and 2018-01-10
# By default, only the first 100 records are returned
biorxiv_content(from = "2018-01-01", to = "2018-01-10")
# Set a limit to return more than 100 records
biorxiv_content(from = "2018-01-01", to = "2018-01-10", limit = 200)
# Or set limit as "*" to return all records
biorxiv_content(from = "2018-01-01", to = "2018-01-10", limit = "*")
# Skip the first 100 records
biorxiv_content(from = "2018-01-01", to = "2018-01-10", limit = 200, skip = 100)
# By default, data is returned in a list. Use the "format" argument to specify
# that data should be returned in "json" format or as a data frame ("df").
biorxiv_content(from = "2018-01-01", to = "2018-01-10", format = "df")
# Lookup a preprint by DOI
biorxiv_content(doi = "10.1101/833400")
```
The bioRxiv API also allows querying of details of [medRxiv](https://www.medrxiv.org/) preprints, by supplying a "server" parameter. This can be specified as follows:
```{r eval=FALSE}
# Get details of medRxiv preprints deposited between 2020-01-01 and 2020-01-02
biorxiv_content(server = "medrxiv", from = "2020-01-01", to = "2020-01-02")
```
The default server parameter is always "biorxiv". Note that the following functions documented below are limited to bioRxiv only (at the time of writing).
### Published article detail
Retrieve details of published articles associated with bioRxiv preprints that were published between two dates:
```{r eval=FALSE}
# Get details of all articles published between 2018-01-01 and 2018-01-10
biorxiv_published(from = "2018-01-01", to = "2018-01-10", limit = "*", format = "df")
```
### Publisher article detail
Retrieve details of articles published by a specific publisher (specified by their doi prefix) between two dates:
```{r eval=FALSE}
# Get details of all articles published by eLife (prefix = 10.7554) between 2018-01-01 and 2018-01-10
biorxiv_publisher(prefix = "10.7554", from = "2018-01-01", to = "2018-01-10",
limit = "*", format = "df")
```
### Content summary statistics
Retrieve summary statistics for bioRxiv content (e.g. number of preprints deposited):
```{r eval=FALSE}
# Get summary statistics at a montly level
biorxiv_summary(interval = "m")
# Get summary statistics at a yearly level
biorxiv_summary(interval = "y")
```
### Usage summary statistics
Retrieve summary statistics for usage of bioRxiv content (e.g. number of pdf downloads):
```{r eval=FALSE}
# Get usage statistics at a montly level
biorxiv_usage(interval = "m")
# Get usage statistics at a yearly level
biorxiv_usage(interval = "y")
```
## API rate and usage limits
No rate or usage limits are currently specified for the bioRxiv API, *however* all functions in this package enforce a 1-second timeout per API call when iterating through multiple pages of results (a single API call currently returns a maximum of 100 results per page).
## Examples
### Growth of bioRxiv over time
```{r eval=FALSE}
library(tidyverse)
# Plot the cumulative number of new preprints deposited per month
# Note that month dates are returned in YYYY-MM format - here we convert
# month dates to YYYY-MM-DD format to make plotting easier
biorxiv_summary(interval = "m", format = "df") %>%
mutate(month = as.Date(paste0(month, "-01"), format = "%Y-%m-%d")) %>%
ggplot() +
geom_bar(aes(x = month, y = new_papers_cumulative),
fill = "#cccccc",
stat = "identity") +
labs(x = "",
y= "Submissions",
title ="Cumulative new bioRxiv submissions") +
scale_x_date(date_breaks = "3 months",
date_minor_breaks = "3 months",
date_labels = "%b-%y",
expand = c(0,0)) +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.title = element_text(face = "bold")
)
```

### PDF downloads over time
```{r eval=FALSE}
library(tidyverse)
# Plot the cumulative number of PDF downloads per month
# Here month dates are returned already in YYYY-MM-DD format
biorxiv_usage(interval = "m", format = "df") %>%
mutate(month = as.Date(month)) %>%
ggplot() +
geom_bar(aes(x = month, y = pdf_cumulative),
fill = "#cccccc",
stat = "identity") +
labs(x = "",
y= "PDF downloads (cumulative)",
title ="Number of bioRxiv PDF downloads over time") +
scale_x_date(date_breaks = "3 months",
date_minor_breaks = "3 months",
date_labels = "%b-%y",
expand = c(0,0)) +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.title = element_text(face = "bold")
)
```

### Time to publication
```{r eval=FALSE}
library(tidyverse)
# Calculate the number of days between preprint deposition and
# journal publication. Plot results as a histogram.
biorxiv_published(from = "2013-11-01", to = "2018-12-31",
limit = "*", format = "df") %>%
mutate(days = as.Date(published_date) - as.Date(preprint_date)) %>%
ggplot() +
geom_histogram(aes(as.numeric(days)),
binwidth = 1,
fill = "#cccccc") +
labs(x = "Days between preprint deposition and journal publication",
y= "Number of articles",
title ="Time to publication") +
coord_cartesian(xlim = c(-100, 1000)) +
theme_minimal() +
theme(
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.title = element_text(face = "bold")
)
```

## Other tools/packages for working medRxiv/bioRxiv data
`rbiorxiv` aims to provide a simple wrapper around the main endpoints of the [bioRxiv API](https://api.biorxiv.org/), and return data for further analysis/manipulation by the R user. Below are some additional packages that provide distinct but related functionality when working with bioRxiv and medRxiv data:
*[`medrxivr`](https://github.com/ropensci/medrxivr), developed by [Luke McGuiness](https://github.com/mcguinlu) and part of the [ROpenSci](https://ropensci.org/) ecosystem, provides users with more powerful tools to download bioRxiv and medRxiv data, and search downloaded preprint records using regular expressions and Boolean login. `medrxivr` also allows users to export their search results to a .BIB file for easy import to a reference manager, and to download the full-text PDFs of preprints matching their search criteria.