๐งฐ Tools for Developers and Semantic Engineers
๐ช Working with strings that might be a URI or a CURIE
Sometimes, itโs not clear if a string is a CURIE or a URI. While the SafeCURIE syntax is intended to address this, itโs often overlooked.
โ๏ธ CURIE and URI Checks
The first way to handle this ambiguity is to be able to check if the string is a CURIE or a URI. Therefore, each Converter
comes with functions for checking if a string is a CURIE (converter.is_curie()
) or a URI (converter.is_uri()
) under its definition.
from curies_rs import get_obo_converter
converter = get_obo_converter()
assert converter.is_curie("GO:1234567")
assert not converter.is_curie("http://purl.obolibrary.org/obo/GO_1234567")
# This is a valid CURIE, but not under this converter's definition
assert not converter.is_curie("pdb:2gc4")
assert converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567")
assert not converter.is_uri("GO:1234567")
# This is a valid URI, but not under this converter's definition
assert not converter.is_uri("http://proteopedia.org/wiki/index.php/2gc4")
use curies::sources::get_obo_converter;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let converter = get_obo_converter().await?;
assert_eq!(converter.is_curie("GO:1234567"), true);
assert_eq!(converter.is_uri("http://purl.obolibrary.org/obo/GO_1234567"), true);
Ok(())
}
๐๏ธ Standardized Expansion and Compression
The converter.expand_or_standardize()
function extends the CURIE expansion function to handle the situation where you might get passed a CURIE or a URI. If itโs a CURIE, expansions happen with the normal rules. If itโs a URI, it tries to standardize it.
from curies_rs import Converter
converter = Converter.from_extended_prefix_map("""[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]""")
# Expand CURIEs
assert converter.expand_or_standardize("CHEBI:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
assert converter.expand_or_standardize("chebi:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
# standardize URIs
assert converter.expand_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
assert converter.expand_or_standardize("https://identifiers.org/chebi:138488") == 'http://purl.obolibrary.org/obo/CHEBI_138488'
# Handle cases that aren't valid w.r.t. the converter
try:
converter.expand_or_standardize("missing:0000000")
converter.expand_or_standardize("https://example.com/missing:0000000")
except Exception as e:
print(e)
import {Converter} from "@biopragmatics/curies";
async function main() {
const converter = await Converter.fromExtendedPrefixMap(`[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]`)
console.log(converter.expandOrStandardize("chebi:138488"))
console.log(converter.expandOrStandardize("https://identifiers.org/chebi:138488"))
try {
console.log(converter.expandOrStandardize("http://purl.obolibrary.org/UNKNOWN_12345"))
} catch (e) {
console.log("Failed successfully)
}
}
main();
use curies::Converter;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let converter = Converter::from_extended_prefix_map(r#"[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]"#).await?;
assert_eq!(converter.expand_or_standardize("http://amigo.geneontology.org/amigo/term/GO:0032571").unwrap(), "http://purl.obolibrary.org/obo/GO_0032571".to_string());
assert_eq!(converter.expand_or_standardize("gomf:0032571").unwrap(), "http://purl.obolibrary.org/obo/GO_0032571".to_string());
assert!(converter.expand_or_standardize("http://purl.obolibrary.org/UNKNOWN_12345").is_err());
Ok(())
}
A similar workflow is implemented in converter.compress_or_standardize()
for compressing URIs where a CURIE might get passed.
from curies_rs import Converter
converter = Converter.from_extended_prefix_map("""[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]""")
# Compress URIs
assert converter.compress_or_standardize("http://purl.obolibrary.org/obo/CHEBI_138488") == 'CHEBI:138488'
assert converter.compress_or_standardize("https://identifiers.org/chebi:138488") == 'CHEBI:138488'
# standardize CURIEs
assert converter.compress_or_standardize("CHEBI:138488") == 'CHEBI:138488'
assert converter.compress_or_standardize("chebi:138488") == 'CHEBI:138488'
# Handle cases that aren't valid w.r.t. the converter
try:
converter.compress_or_standardize("missing:0000000")
converter.compress_or_standardize("https://example.com/missing:0000000")
except Exception as e:
print(e)
print(type(e))
import {Converter} from "@biopragmatics/curies";
async function main() {
const converter = await Converter.fromExtendedPrefixMap(`[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]`)
console.log(converter.compressOrStandardize("https://identifiers.org/chebi:138488"))
console.log(converter.compressOrStandardize("gomf:0032571"))
try {
console.log(converter.compressOrStandardize("http://purl.obolibrary.org/UNKNOWN_12345"))
} catch (e) {
console.log("Failed successfully)
}
}
main();
use curies::Converter;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let converter = Converter::from_extended_prefix_map(r#"[{
"prefix": "CHEBI",
"prefix_synonyms": ["chebi"],
"uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
"uri_prefix_synonyms": ["https://identifiers.org/chebi:"]
}]"#).await?;
assert_eq!(converter.compress_or_standardize("http://amigo.geneontology.org/amigo/term/GO:0032571").unwrap(), "go:0032571".to_string());
assert_eq!(converter.compress_or_standardize("gomf:0032571").unwrap(), "go:0032571".to_string());
assert!(converter.compress_or_standardize("http://purl.obolibrary.org/UNKNOWN_12345").is_err());
Ok(())
}
๐ Bulk operations
You can use the expand_list()
and compress_list()
functions to processes many URIs or CURIEs at once..
For example to create a new URI
column in a pandas dataframe from a CURIE
column:
import pandas as pd
from curies_rs import get_bioregistry_converter
converter = get_bioregistry_converter()
df = pd.DataFrame({'CURIE': ['doid:1234', 'doid:5678', 'doid:91011']})
# Expand the list of CURIEs to URIs
df['URI'] = converter.expand_list(df['CURIE'])
print(df)
๐งฉ Integrating with rdflib
RDFlib is a pure Python package for manipulating RDF data. The following example shows how to bind the extended prefix map from a Converter
to a graph (rdflib.Graph
).
import curies_rs, rdflib, rdflib.namespace, json
converter = curies_rs.get_obo_converter()
g = rdflib.Graph()
for prefix, uri_prefix in json.loads(converter.write_prefix_map()).items():
g.bind(prefix, rdflib.Namespace(uri_prefix))
A more flexible approach is to instantiate a namespace manager (rdflib.namespace.NamespaceManager
) and bind directly to that.
import curies_rs, rdflib, json
converter = curies_rs.get_obo_converter()
namespace_manager = rdflib.namespace.NamespaceManager(rdflib.Graph())
for prefix, uri_prefix in json.loads(converter.write_prefix_map()).items():
namespace_manager.bind(prefix, rdflib.Namespace(uri_prefix))
URI references for use in RDFLibโs graph class can be constructed from CURIEs using a combination of converter.expand()
and rdflib.URIRef
.