Curating Publications and References
The example below shows a subset of the record for
3D Metabolites (3dmet) that highlights the publications
list.
Note that each entry is a dictionary with several parts:
title
(required) - the title of the paperyear
(highly recommended) - the year of publication of the paperpubmed
,doi
, andpmc
(one or more required) - identifiers for the paper
"3dmet": {
"name": "3D Metabolites",
"publications": [
{
"doi": "10.2142/biophysico.15.0_87",
"pmc": "PMC5992871",
"pubmed": "29892514",
"title": "Chemical curation to improve data accuracy: recent development of the 3DMET database",
"year": 2018
},
{
"doi": "10.1021/ci300309k",
"pubmed": "23293959",
"title": "Three-dimensional structure database of natural metabolites (3DMET): a novel database of curated 3D structures",
"year": 2013
}
]
},
Similarly, there are URL references that are not publications that are worth curating. These can be
stored in the references
list. For example, the
Registry of Toxic Effects of Chemical Substances (rtecs) entry appears in the
Bioregistry because of its usage, but it is hard to find information on the internet about it. Therefore, the
references list is perfect for storing references to PDFs and webpages that describe the resource.
"rtecs": {
"name": "Registry of Toxic Effects of Chemical Substances",
"publications": [
{
"doi": "10.1016/s1074-9098%2899%2900058-1",
"title": "An overview of the Registry of Toxic Effects of Chemical Substances (RTECS): Critical information on chemical hazards",
"year": 1999
}
],
"references": [
"https://www.cdc.gov/niosh/docs/97-119/pdfs/97-119.pdf",
"https://www.cdc.gov/niosh/npg/npgdrtec.html"
]
}
What else is good to keep track of in the references list:
- Bioregistry issue or pull requests about the resource
- Links to webpages describing the identifier resource
- Links to discussions on Slack or other platforms (keeping in mind links might not last forever)
- Any other context that’s useful for a Bioregistry reader
Why Should I Curate Publications and References?
- They give additional context for Bioregistry readers who want to know more about the paper
- They make it easier to attribute usage of identifiers from a given resource to its authors
-
They enable global landscape analysis of when and where identifier resources are being made. The following image is automatically regenerated with each Bioregistry update:
- They support the training of a machine learning for semi-automated curation of additional literature. See this talk from the 2022 Workshop on Prefixes, CURIEs, and IRIs.