Note added 30 May 2013
Additional information concerning the use of OAI-ORE to specify aggregations of research outputs into data packages and research objects has been added at the end of this blog post.
– – –
Journal publishing has come some way in the last few years towards adopting the some of the semantic enhancements for scholarly journal articles described in [1] and illustrated in [2]. Publishers like of Pensoft Publishers are leading the way. However, there is much that could and should be done more widely.
So what are the most important things that publishers and their authors should be doing to enhance their journal articles? I have drafted the following suggestions, and now present them here to stimulate discussion within the community.
1 Develop semantic authorship of articles
Mark up important concepts with ontology terms and link them to external information sources such as chemical names and gene names with ontology, using tools like the Ontology Add-on for Word and the Chemistry Add-In for Word, or employing direct authoring platforms such as PLoS Currents and the Pensoft Writing Tool. Alternatively, semantic mark-up can be added by skilled editors after article submission, as routinely done by the Royal Society of Chemistry.
2 Use citation typing
Use citation typing to enable your authors to state why they are citing others’ work, using terms from CiTO, the Citation Typing Ontology, to describe why the author is citing each reference in the reference list. This can be enabled by employing the CiTO Reference Annotation Tools described in the previous blog post.
3 Publish machine-readable bibliographic metadata describing the article
Create machine-readable bibliographic metadata describing each article, specifying the authors, title and other bibliographic record details, for example by using an XML document mark-up language such as JATS, the Journal Article Tag Suite.
Also create RDF versions of these metadata using the SPAR (Semantic Publishing and Referencing) Ontologies together with other appropriate vocabularies, employing our JATS2RDF mapping to assist in this, described in [3].
Publish these RDF bibliographic metadata in three ways:
- embedded in the online paper in RDFa,
- in a supplementary RDF file, and/or
- together with metadata describing other articles in a searchable database on the publisher’s web site.
The NPG Linked Data Platform provides a good example of the third way.
4 Encode the reference list of each article in machine-readable form
Encode the reference list of each article as machine-readable bibliographic metadata, and publish these in RDF, as for the bibliographic metadata (Point 3 above).
5 Open the reference lists of your articles
Open the reference lists of your articles, free from the copyright protection that covers the authored body text, so that they are freely available, even if your full-text articles are only available by subscription. Do this:
- by changing the copyright and licensing arrangements for your articles, acknowledging that the citation data embedded in the reference lists are indeed non-copyrightable data that should be made freely available for the general benefit of the scholarly community,
- by place the human-readable version of the article’s reference list outside the subscription firewall protecting the copyright text, and
- by opening your article reference lists via CrossRef for harvesting and inclusion in the Open Citations Corpus, as detailed in my recent Open Letter to Publishers.
6 Use ORCIDs to identify authors and contributors, and encode their affiliations, funders, grants and geo-locations
Ensure the article metadata include ORCID personal IDs for all authors and contributors, if available, the details of their institutional affiliations, and also the names of funding bodies and the grant numbers of any funding that facilitated the research described in the article. Ensure that the geo-locations (longitude and latitude) of places mentioned in the text, such as field sites, are also recorded.
Publish this information both in human-readable text and in machine-readable form. This will facilitate author disambiguation, and will enable assignment of funder credit, evaluation of research grant outputs, and mapping of study sites, species distributions, etc.
7 Detail the contributions and roles of authors and other contributors
Adopt the recommendations of the Final Report of the Wellcome Trust / Harvard International Workshop on Contributorship and Scholarly Attribution, by providing human-readable text and machine-readable metadata detailing the authors’ roles and contributions to the research articles, and other’s roles and contributors to the preceding research investigations, to enable better attribution of credit.
Such metadata can be encoded in RDF easily, unambiguously and in a standardized machine-readable form, using SCoRO, the Scholarly Contributions and Roles Ontology. The Scholarly Contributions Report Form, SCoRF, is a simple Excel spreadsheet that makes such metadata creation easy, by encoding SCoRO ontology terms in drop-down lists. It is available here: http://purl.org/spar/scoro/scorf/, and an exemplar completed form is available here: http://purl.org/spar/scoro/scorf-example/.
8 Publish the research data underlying the results described in your articles
The research data underlying the results described in research articles should be published in appropriate open public databases or repositories under a CCZero data waiver, so that the data can be re-used without restriction, as recommended by the recent Royal Society report Science as an Open Enterprise. Adopt best practice for the citation of these datasets, by insisting that your authors include formal data references in their articles’ reference lists [4 – 6].
9 Publish a Structured Summary of each article
Publish a separate machine-readable Structured Summary of each article, to complement the Abstract. This should be a set of simple statements of the primary facts about the article, for example that it was a species abundance study, of a named species, undertaken at a particular place and over a specified time period, having stated numerical results. Such data, if published in machine-readable form, will enormously enhance automated attempts to cluster papers having certain criteria in common, as is necessary, for example, before attempting a systematic review.
The on-line report form for MIIRO, Minimal Information for an Investigation and Research Outputs, is a structured web form that facilitates the task of creating such Structured Summaries.
10 Score your articles against the Five Stars of Online Journal Articles
The Five Stars of Online Journal Articles is a constellation of five independent criteria concerning
- peer review
- open access
- enriched content
- available datasets
- machine-readable metadata
described in a previous blog post, by which the quality of an online journal article may be evaluated to see how well it matches up with current aspirations for enhanced research communications, as detailed in [7].
Publish the Five Star Rating of each article alongside the article itself. Authors and publishers whose articles fulfil criteria 1-9 above will score well in the Five Star evaluation, and will be in the forefront of advances in scholarly publishing.
– – –
Note added 30 May 2013
The discussion above concerns various research outputs, including journal articles, datasets and structured summaries. Other research output that may be required to provide full understanding and reproducibility of a particular research investigation include descriptions of the methods, protocols and workflows involved in producing and analysing the data used or produced, provenance information about the experiments and datasets, details concerning the people involved in the investigation, and additional annotations about these resources that assist in interpretation of the scientific outcomes.
The Open Archives Initiative’s Object Reuse and Exchange metadata model (OAI-ORE; http://www.openarchives.org/ore/) defines a data model and a number of serializations (RDF, Atom and RDFa) for the description and exchange of aggregations of Web resources, and can be used to specify aggregations of such research outputs.
For example, the following RDF statements specify that a simple data package in the Dryad data repository is an aggregation of a single Excel data file and the Dryad web page that provides metadata for that data file:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ore: <http://www.openarchives.org/ore/terms/>. <http://datadryad.org/handle/10255/dryad.8684> # Data package in the Dryad repository a ore:Aggregation ; ore:aggregates <http://datadryad.org/handle/10255/dryad.8685> , <http://datadryad.org/bitstream/handle/10255/dryad.8685/body%20size%20data%20%28dry%20weight%2c%20wing%20area%2c%20Cell%20size%20and%20cell%20number%29.xls> .
Research Objects (http://www.researchobject.org/) are specific OAI-ORE aggregations of such research outputs, packaged for transmission in a particular manner, that are designed to facilitate the sharing and reuse of these research outputs, and to permit the better understanding and reproducibility of the scientific experiments to which they relate, as detailed in [8].
References
[1] Shotton D, Portwin K, Klyne G and Miles A (2009). Adventures in semantic publishing: exemplar semantic enhancement of a research article. PLoS Computational Biology5: e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361.
[2] Our enhanced version of the Reis et al. (2008) paper:
Reis RB, Ribeiro GS, Felzemburgh RDM, Santana FS, Mohr S et al. (2008). Impact of environment and social gradient on Leptospira infection in urban slums PLoS Neglected Tropical Diseases2: e228.
is available at http://dx.doi.org/10.1371/journal.pntd.0000228.x001.
[3] Peroni S, Lapeyre DA and Shotton D (2012). From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. Proc. 2012 JATS Conference, National Library of Medicine, Bethesda, Maryland, USA, 16-17 October 2012. http://www.ncbi.nlm.nih.gov/books/NBK100491/.
[4] Goodman L, Lawrence R and Ashley K (2012). Data-set visibility: Cite links to data in reference lists. Nature492: 356. http://dx.doi.org/10.1038/492356d.
[5] Borgman CL (2012). Why Are the Attribution and Citation of Scientific Data Important? InFor Attribution – Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop (pp. 1–10). The National Academies Press, Washington, D.C. Retrieved from http://www.nap.edu/catalog.php?record_id=13564.
[6] National Academies of Science. US CODATA and the Board on Research Data and Information, in collaboration with CODATA-ICSTI Task Group on Data Citation Standards and Practices. (2012). Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. Washington, DC. Retrieved from http://sites.nationalacademies.org/PGA/brdi/PGA_064019.
[7] Shotton D (2012). The Five Stars of Online Journal Articles — a framework for article evaluation. D-Lib Magazine18 (1/2). http://dx.doi.org/10.1045/january2012-shotton.
[8] Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides D, Owen S, Newman D, Sufi S and Goble, C. (2011). Why linked data is not enough for scientists. Future Generation Computer Systems. 29 (Issue 2, February 2013): 599–611. http://dx.doi.org/10.1016/j.future.2011.08.004.
Thanks for an excellent list! I wonder if a few of the items here are not overkill, in that they add semantics that could already be inferred from the other steps. After all, inference is the point of linked data, isn’t it?
For instance, shouldn’t we be able to compute the rest of the bibliographic markup from the DOI or URI? Asking publishers to add RDFa, RDF, and XML all declaring that “volume” has property http://purl.org/ontology/bibo/volume doesn’t seem to add much value. Adding just enough semantic data that we can reliably infer the rest might help researchers and publishers get the most bang for the buck, so to speak.
I wonder if scoring the publishers on these ten points would provide any incentive to adopt these innovations? At least from the researcher perspective it would be useful to know which publishers were providing the most real added value in terms of these semantics.
I’d agree with Carl on point 4; it makes no sense, the in text citations are enough, if they use DOIs or URIs. Point 5, also, I am unconvinced. How can I, as an author, either claim or disclaim copyright on other peoples titles? If they are copyrightable at all, then I do not own that copyright. If they are not copyrightable, then it doesn’t matter anyway.
Phil, Point 5 was addressed primarily to publishers, not authors. While an article’s bibliographic references are actually “just data”, and thus fall outside the copyright protection that covers the rhetorical text of the article itself, the reality is that most publishers don’t make this distinction, but rather have licenses that preclude free re-use of the contents of reference lists. Happily, this situation is changing, with a number of publishers now putting references outside subscription pay walls. An increasing number have also responded to my Open Letter to Publishers of January 3rd 2013, and have informed CrossRef that they are happy for the reference lists of their articles to be exposed through the CrossRef API for re-use, for example for ingest into the Open Citations Corpus, as noted in other Open Citations Blog posts.
To “compute the rest of the bibliographic markup from the DOI or URI” implies the use of a web service and requires that someone – ideally the publisher – has indeed encoded the bibliographic metadata in machine-readable form and made it publicly available. Ironically, most publishers already have this information nicely marked up in XML for all the papers in their production system, but sadly simply throw it away at the end of the publication pipeline and just publish PDFs, leaving PubMed or CrossRef the job of re-creating it. Others, like PLoS, expose their articles in XML for external harvesting. My feeling is that full machine-readable bibliographic metadata should accompany each article, rather than having to be fetched from some third-party source.
Pingback: Open access journals – wheat, chaff and hopeful monsters | Semantic Publishing