enable site default for config source #647
Conversation
|
I've added documentation about how the config source ("files" or "dataregistry") can be set explicitly. While I was at it I discussed other GCRCatalogs configuration. Since README.md is rather long already I put this information in a new file. I'm tempted to similarly extract other sections of README.md. As it is, it's difficult to find anything beyond the "Available Catalogs" section. |
|
Thanks @JoanneBogart I looked over the code changes and they look good, but I have not actually tested the code. The added documentation text is useful. At some point (not necessary with this PR) we should start to add unit tests as the register code is now getting quite complicated. I think there's not backward compatibility issue with this PR so we can just increment the PATCH number. |
|
@yymao The lack of CI testing bothers me too. I believe I tested all cases as I was developing the code with a local test program, but without CI tests there's no guarantee some future update will break it. I haven't tackled it so far because, with no access to NERSC from CI, it would be a substantial piece of work. |
|
@JoanneBogart Thank you! Indeed, integrating GitHub CI with NERSC will be a significant undertaking. Although I wonder if we may be able to create small-scope unit tests that does not involve integrating with NERSC. For example, we can test the logic of specific functions/steps in the register module, which may be doable without integrating NERSC. |
|
What I was thinking of doing was something like what is done in the dataregistry CI: create a postgres db and make some entries which could be based on actual catalogs. The catalog data would not be available, but a number of things could be tested. |
fix #646
The config source (that is, source of catalog configurations) for GCRCatalogs may be either "files" - the original way of getting this information - or "dataregistry". Currently "dataregistry" only makes sense for processes running at NERSC and using the catalogs in the DESC shared area (i.e., not the publicly released ones). In all other cases the source should be set to "files"
The config source now gets set as follows:
SInce some of this logic - in particular the part trying to determine the site - is the same as what's needed for determining root dir, the code has been reorganized somewhat and the old file
site_rootdir.yamlhas been replaced bysite_info.yaml. One could probably pull out more common code for finding the site values and put it in a separate module (currently it's largely in root_dir_manager.py but not part of the RootDirManager class).