-
Notifications
You must be signed in to change notification settings - Fork 31
Unify model naming convention #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Good start. But I don't think we should use model aliases in such a manner. It is very confusing that MACE-mp-0-small maps to "small". It is completely unintuitive. An "alias" is a shortened name for a commonly used model (and you should not have too many of these since commonly used by definition means a few). This is what I propose:
I would even argue that for All names need to be processed in a case-insensitive manner. @Andrew-S-Rosen Welcome your views too. |
|
Thanks for the ping and for tackling this, @rul048. While conceptually "simple", I think #120 is incredibly important to address and am glad you are working on this. I agree with your comments, @shyuep.
I do not have much of an opinion so long as it is internally consistent and captures all the necessary nuance.
Presumably, if a TensorNet-MatPES-PBE-v2026.1 were to come out next month, then TensorNet-PBE would be remapped to this newer version in a future release of matgl. On one hand, it makes sense to return the "best" version for the user. Of course, this downside of this is that when the alias is remapped, then there will be breaking changes if the user upgrades without reading a CHANGELOG. I don't see a way around that. You get what you ask for with the alias.
Yup. |
Change names from list comprehension to set comprehension. Signed-off-by: Runze Liu <[email protected]>
|
Compared to the previous version, the latest commit splits model resolution into two explicit stages. It introduces two separate mappings: ID_TO_ALIAS (user-facing aliases -> canonical IDs) and ID_TO_NAME (canonical IDs -> backend-specific model names). This makes the canonical identifier the shared “source of truth” inside MatCalc, while aliases become purely a user-compatibility layer, and backend names become purely an implementation detail. Along with this change, the PR description now formalizes a simpler canonical identifier format at the MatCalc level (Architecture-Dataset-Functional-Version-Size) rather than treating the canonical name as a variable-length. |
src/matcalc/utils.py
Outdated
|
|
||
| # Keys must be lowercase and represent canonical identifiers | ||
| # Values are the actual model names passed to the backend libraries. | ||
| ID_TO_NAME = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother ensuring IDs are lower case here. It is better to use the proper name capitalization. When checking, we can make it a non-case-sensitive check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I’ve kept the canonical capitalization for the IDs (eg., TensorNet-MatPES-r2SCAN-v2025.1-S), and the resolution is now case-insensitive when matching user input.
src/matcalc/utils.py
Outdated
| } | ||
|
|
||
| # Common aliases and abbreviations will load the most advanced or widely used model. | ||
| ID_TO_ALIAS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is non-intuitive. It should be ALIAS_TO_ID. And it is a many-to-one mapping.
Signed-off-by: Runze Liu <[email protected]>
Signed-off-by: Runze Liu <[email protected]>
Signed-off-by: Runze Liu <[email protected]>
Signed-off-by: Runze Liu <[email protected]>
Summary
This PR standardizes how MatCalc loads universal MLIPs across multiple backend libraries (MatGL, MACE, GRACE, etc.) as a solution of Issue #120.
Different backend libraries currently use inconsistent or non-canonical model names (e.g., "small-omat-0", "medium-mpa-0", "GRACE-1L-OMAT-medium-base", "TensorNet-MatPES-PBE-v2025.1-PES"), which leads to ambiguity, user confusion, and repeated conversion logic scattered across the codebase.
To fix this, the PR introduces a unified model naming convention. The canonical identifier format is:
Architecture-Dataset-Functional-Version-Size
Examples:
The alias conversion table (ID_TO_ALIAS) can map the commonly used case-insensitive abbreviations or allies (eg., MACE-MP-0, MACE-MP-PBE-0, MACE-MP-0-M) entered by the user to the canonical identifiers used in MatCalc. And the backend name conversion table (ID_TO_NAME) can map these identifiers to the model names that each backend library actually uses so that users can write:
Use abbreviations will load the most advanced model in the model family.
Major changes:
Todos
If this is work in progress, what else needs to be done?
Checklist
ruff.mypy.duecredit@due.dcitedecorators to reference relevant papers by DOI (example)Tip: Install
pre-commithooks to auto-check types and linting before every commit: