Skip to content

enable site default for config source #647

Merged
JoanneBogart merged 4 commits intomasterfrom
u/jrbogart/config_source_by_site
May 5, 2025
Merged

enable site default for config source #647
JoanneBogart merged 4 commits intomasterfrom
u/jrbogart/config_source_by_site

Conversation

@JoanneBogart
Copy link
Contributor

fix #646

The config source (that is, source of catalog configurations) for GCRCatalogs may be either "files" - the original way of getting this information - or "dataregistry". Currently "dataregistry" only makes sense for processes running at NERSC and using the catalogs in the DESC shared area (i.e., not the publicly released ones). In all other cases the source should be set to "files"
The config source now gets set as follows:

  • if the dataregistry module can't be imported, use "files"
  • else if the environment variable GCR_CONFIG_SOURCE is set to an acceptable value, use that. If it's been set to an unknown value raise an exception.
  • else if the site can be determined and has a known value, use that to look up the proper default
  • otherwise use "files"

SInce some of this logic - in particular the part trying to determine the site - is the same as what's needed for determining root dir, the code has been reorganized somewhat and the old file site_rootdir.yaml has been replaced by site_info.yaml. One could probably pull out more common code for finding the site values and put it in a separate module (currently it's largely in root_dir_manager.py but not part of the RootDirManager class).

@JoanneBogart
Copy link
Contributor Author

I've added documentation about how the config source ("files" or "dataregistry") can be set explicitly. While I was at it I discussed other GCRCatalogs configuration. Since README.md is rather long already I put this information in a new file. I'm tempted to similarly extract other sections of README.md. As it is, it's difficult to find anything beyond the "Available Catalogs" section.
Should minor version be incremented or just patch?

@JoanneBogart JoanneBogart requested review from heather999 and yymao May 2, 2025 23:22
@yymao
Copy link
Member

yymao commented May 5, 2025

Thanks @JoanneBogart I looked over the code changes and they look good, but I have not actually tested the code. The added documentation text is useful.

At some point (not necessary with this PR) we should start to add unit tests as the register code is now getting quite complicated.

I think there's not backward compatibility issue with this PR so we can just increment the PATCH number.

yymao
yymao previously approved these changes May 5, 2025
Copy link
Member

@yymao yymao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. See my comment above.

@JoanneBogart
Copy link
Contributor Author

@yymao The lack of CI testing bothers me too. I believe I tested all cases as I was developing the code with a local test program, but without CI tests there's no guarantee some future update will break it. I haven't tackled it so far because, with no access to NERSC from CI, it would be a substantial piece of work.

@JoanneBogart JoanneBogart requested a review from yymao May 5, 2025 17:49
@JoanneBogart JoanneBogart merged commit 315eb87 into master May 5, 2025
3 checks passed
@JoanneBogart JoanneBogart deleted the u/jrbogart/config_source_by_site branch May 5, 2025 18:19
@yymao
Copy link
Member

yymao commented May 6, 2025

@JoanneBogart Thank you!

Indeed, integrating GitHub CI with NERSC will be a significant undertaking. Although I wonder if we may be able to create small-scope unit tests that does not involve integrating with NERSC. For example, we can test the logic of specific functions/steps in the register module, which may be doable without integrating NERSC.

@JoanneBogart
Copy link
Contributor Author

What I was thinking of doing was something like what is done in the dataregistry CI: create a postgres db and make some entries which could be based on actual catalogs. The catalog data would not be available, but a number of things could be tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide site-specific default for whether to attempt to connect to dataregistry database or not

2 participants