A robust Rust crate for parsing and validating Compact Uniform Resource Identifiers (CURIEs). It provides standard syntax validation as well as specific, regex-based validation for hundreds of biological and biomedical ontologies (e.g., GO, MONDO, CHEBI) generated directly from Bioregistry.
- General Validation: Parse and validate CURIEs based on standard syntactic rules (W3C-style prefix/reference separation).
- Ontology-Specific Validation: precise Regex validation for over 100+ supported ontologies (including GO, CHEBI, NCIT, etc.).
- Auto-Generated Patterns: Validation logic is generated from upstream Bioregistry metadata, ensuring compliance with current standards.
- Dynamic Lookup: Instantiate validators dynamically using string prefixes (e.g., from_prefix("go")).
- Lightweight: Core dependencies are minimal (primarily regex).
Add this to your Cargo.toml:
[dependencies]
securiety = "0.2.0"- General Parsing If you need to validate that a string is simply a well-formed CURIE (has a valid prefix and reference structure) without enforcing specific ontology patterns:
use securiety::{CurieParser, CurieParsing};
fn main() {
// Create a general parser
let parser = CurieParser::general();
// Valid syntax
let curie = parser.parse("AnyPrefix:12345").unwrap();
println!("Prefix: {}, Reference: {}", curie.prefix(), curie.reference());
// Invalid syntax (e.g., no separator, invalid characters)
let result = parser.parse("InvalidString");
assert!(result.is_err());
}- Specific Ontology Validation You can use strict, pre-compiled regex validators for specific ontologies. This ensures that a GO term actually looks like a GO term (e.g., GO:0001234).
use securiety::{CurieParser, CurieParsing};
fn main() {
// strict validator for Gene Ontology
let go_parser = CurieParser::go();
// Passes: Matches ^GO:\d{7}$
let valid = go_parser.parse("GO:0006915");
assert!(valid.is_ok());
// Fails: Syntax is okay, but pattern matches invalid GO ID
let invalid = go_parser.parse("GO:ABC");
assert!(invalid.is_err());
}- Dynamic Prefix Lookup If you are processing data where the ontology prefix is determined at runtime, you can look up the validator dynamically:
use securiety::{CurieParser, CurieParsing, CurieRegexValidator, CurieValidation};
fn main() {
let input_prefix = "mondo";
let input_curie = "MONDO:0012345";
// Attempt to find a validator for the given prefix
if let Some(parser) = CurieParser::from_prefix(input_prefix) {
match parser.parse(input_curie) {
Ok(curie) => println!("Valid {} term: {}", input_prefix, curie),
Err(e) => println!("Invalid term: {}", e),
}
} else {
println!("No validator found for prefix: {}", input_prefix);
}
}This crate includes generated validators for a wide range of biological ontologies found in the Bioregistry, including but not limited to:
- GO (Gene Ontology)
- MONDO (Mondo Disease Ontology)
- CHEBI (Chemical Entities of Biological Interest)
- NCIT (NCI Thesaurus)
- HP (Human Phenotype Ontology)
- UBERON (Uber Anatomy Ontology)
Note: The patterns are generated using the create.rs utility which fetches metadata from the Bioregistry API.
The parser returns a CurieParsingError enum to distinguish between structural failures and validation failures:
- InvalidCurie(String): The string failed the specific validation logic (e.g., Regex mismatch).
- UnparsableCurie(String): The string lacked the basic structure of a CURIE (e.g., missing a colon).