Data Standardization and Integration with the Bioregistry at Biocuration 2025

The Bioregistry project (https://bioregistry.io, https://github.com/biopragmatics/bioregistry, https://www.nature.com/articles/s41597-022-01807-3) promotes data integration by cataloging resources that assign persistent identifiers to biomedical concepts.

It supports many linked open data and semantic web users by producing a harmonized and comprehensive prefix map and providing standardized tooling for working with prefixes, uniform resource identifiers (URIs), and compact URIs (CURIEs). The Bioregistry in turn is used by tools like LinkML, data standards like SSSOM, web applications like the EBI Ontology Lookup Service (OLS), and projects like the OBO Foundry and Monarch Initiative (see https://biopragmatics.github.io/bioregistry/usages/).

This two-part workshop will include a lecture and hackathon component.

First, we will give an introduction to the Bioregistry that includes the following:

An overview of the data model, database, web application, and Python package
A practical example of using the Bioregistry for data standardization and integration
Maintenance of the Bioregistry
1. Making a new prefix request
2. Reviewing a prefix request
An overview of curation tasks and guides for new contributors (https://biopragmatics.github.io/bioregistry/curation)

Second, we will host a hackathon open to veteran and new contributors. We will work together to address (some of) the following:

Improve existing records, e.g., by adding contact people
Resolve open issues for new prefixes and updating existing prefixes (https://github.com/biopragmatics/bioregistry/issues)
Improve harmonization with other registries
1. Integrate Wikidata properties for several domains (taxonomy, bibliometrics, chemistry, etc.)
Pilot semi-automated new prefix suggestion workflow
Address use case-specific data standardization and integration scenarios in a “bring your own data” setting

Based on this experience, we expect to write new contribution guidelines and tutorials that will enable additional contributors. The Bioregistry governance model stipulates that all material contributors to the resource are eligible for co-authorship on future papers. If we are able to make substantial contributions during this time, we would also like to write a short conference report and consider outlining an update paper as a follow-up to the original 2022 publication in Nature Scientific Data.

Important Links

Workshop Slides
OBO Foundry Slack invite, join the #prefixes channel
Bioregistry GitHub
Bioregistry Site
Lightning Talks
- Semi-Automated Curation of Publications in the Bioregistry
- Automated evaluation of cross-registry mappings in the Bioregistry through sentence embeddings

Important Links

Recording