Skip to content

Conversation

@zkhotanlou
Copy link
Collaborator

After asserting that Model.data == data.obj, all instances of data access through the model have been eliminated. This necessitated structural changes to the Model class and a redesign of workflows, particularly in evaluation and benchmarking. The revised approach now explicitly passes data where needed, improving modularity and clarity.

Several deprecated methods, including scatterDataset, scatterDecisionBoundry, and visualizeDatasetAndFixedModel, have been moved to a dedicated deprecated/load_model.py file. These methods were previously used in loadModel.py, which is now commented out. Similarly, the pipelining folder has been deprecated and relocated, as it was only utilized in api/model.py for returning features. Loadmodelfordataset now accepts dataset objects rather than dataset names.
A key highlight of this pull request is the introduction of a more intuitive project workflow. The new flow encapsulates the dataset, model, and recourse method setup, ensuring clean interactions between components:

dataset = DataCatalog(data_name, model_type)
model = ModelCatalog(dataset, model_type, backend)
method = RecourseMethod(dataset, model)
benchmark = Benchmark(method, factuals)
This approach aligns with the decoupling of data and model, simplifying how evaluation and benchmarking pipelines access these components. The method object now serves as the single source for both the model and data, ensuring encapsulation and cleaner dependencies.

@zkhotanlou zkhotanlou requested review from a4bello and amirhk January 29, 2025 03:52
@amirhk
Copy link
Collaborator

amirhk commented Jan 29, 2025

Great new workflow! A few questions below:

These methods were previously used in loadModel.py, which is now commented out.

the deprecated code should be removed, not commented.

dataset = DataCatalog(data_name, model_type)

why does the dataset (I suppose data_obj) need information on the model_type?

method = RecourseMethod(dataset, model)

should the method_type not be included as an argument?

benchmark = Benchmark(method, factuals)

question: other than runtime, are there any other metrics that rely on the method itself? it seems cleaner if the method footprint was: performance = Benchmark(method, factuals)

another question, is Benchmark a class (hence starting with upper case B), or is it a method (should start with lower case b)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants