Skip to content

Conversation

@hina-shah
Copy link

Creating a Config class to reduce reliance on environment variable availability in general, while setting reasonable defaults. This update also makes ORION specific directory generation to be not dependent on user input, and creates these directories under a Storage directory by default.

@EvanDietzMorris
Copy link
Contributor

This is an excellent start @hina-shah and definitely the right direction for what we need.

In general the thing I think we'll want to change while we're working on this, which will be slightly more complicated but I'm happy to help, is that not only do we not want to force users to set environment variables, we want to avoid relying on these directories and variables at all when they're not needed. So instead of making default directories as soon as the config is loaded, we want to defer that behavior until they are actually used. That way a user could use certain parts of ORION, like utilities or kgx file manipulation, without ever needing a STORAGE or GRAPHS directory. You could look at where these vars are actually accessed (load_manager.py and build_manager.py for example) and if they are not set by the user, we create the directory then as it is needed.

Logs is a bit of a special case and for those it would be great if we could have the default be to write logs only to stdout and not even write to disc, so a logs directory is truly optional and is not generated unless the user specifies it.

@EvanDietzMorris
Copy link
Contributor

Alright, looking at this some more, there are some issues we need to resolve and some coding style things I would prefer but aren't as serious. In general, I think we are changing a bit too much here. We just wanted to remove the necessity of setting environment variables and create default directories as needed. This changes the way the directories are organized, the env var names, and other aspects of the config and how it's accessed in unnecessary ways. It's easy to break things, including on other branches, and now other projects that use ORION as a dependency, and the less we change for no reason the better.

In order of importance:

  1. Changing the names of the env vars (ORION_GRAPHS vs ORION_GRAPHS_DIR_NAME) will break our helm charts, docs, and who knows what else hah. I'd prefer to not do that right now with so many moving pieces. I see that specific change was done for a reason, but see point 3.

  2. We already had a config implementation using the dotenv package, which provides a lot of nice options we could use. It combined vars from an .env file with environment variables. It already worked with our docker deployments. I don't see a reason to move away from that. This implementation doesn't read from the .env file at all yet it looks like?

  3. It looks like these changes force users to have their storage, graphs, and logs directories under the same directory, and switches their env vars from paths to filenames. That's a reasonable approach but it will break existing helm charts and ORION workspaces that do not have the directories set up that way. It's fine and makes sense if the defaults set up the directories that way, but I don't see a reason to remove the ability for a user to specify three different paths. There are some benefits to having the ability to have them be in different places we can go over if you'd like, but I mainly don't want to break things.

  4. Moving the NODE_NORMALIZATION_URL from the module level to the class level is an example of an unnecessary change. This would break at least one dependency on ORION which uses that constant. It also changes the intention of having it as a module level constant (in all caps), which means it shouldn't change, rather than a class level variable. Same for the NAME_RESOLVER_XXX constants . Cleaning up the usage of DEFAULT_EDGE_NORM_ENDPOINT and EDGE_NORMALIZATION_ENDPOINT was good because it followed a different/weird pattern, but stylistically I'd prefer to have that be a module level constant as well.

  5. Changing the pattern of how to import and access the config, and calling "Config.from_env()" everywhere without any use-specific parameters or anything like that seems unnecessary. The config doesn't ever need to be a class level variable of other objects, and we should have it's usage consistent across the project. Everything needed to initialize it can be done in config.py and then imported and accessed by other modules like config.xxx rather than (self.config = Config.from_env(), self.config.xxx). Again, the existing pattern was good.

  6. I'd prefer that the logic of creating the default directories not be hidden. For example, in load manager.py, init_storage_dir(), before it was easy to see the logic of what was happening. Now it just says "config.getenv("ORION_STORAGE_DIR_NAME")" which does not make it clear that it's going to create a directory if needed. Making a clear distinction between "I'm getting a directory name from the config" and "I'm validating that this directory exists and if not creating it" would be better. This change also changed the logic of handling the env var - now it creates a directory at an arbitrary path from an env var instead of failing and letting the user know their env var wasn't valid. I think I'd prefer the latter but that's not a big deal.

Sorry, I know this is a lot and I should have caught some of these things sooner, but the number of breaking changes here is going to cause a lot of problems I want to avoid. Let me know if you have questions and/or we can meet to discuss next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants