Bugfix/package config edgecases #8

jack-mcmahon · 2024-02-08T18:32:15Z

Simplified and improved the readability of logging outputs. Also fixed bugs introduced by hardcoded dataset processing and args in quicktest.py

ntomita

Thanks for your PR. I have some minor requests that should be addressed before merging. Thanks!

ntomita · 2024-02-18T21:15:34Z

MaskHIT/maskhit/cross_validation.py

 config = Config(config_file_default, config_file)
 folds = [int(i) for i in args.folds.split(',')] if args.folds else list(range(config.dataset.num_folds))
-print(f"Testing on folds: {list(folds)}")
+print(f"[INFO] Conducting 5 fold Cross Validation on folds: {list(folds)}")


Please revert this change: fold number is arbitrary and we should not hard code it.

Removed the specific fold number

ntomita · 2024-02-18T21:17:00Z

MaskHIT/maskhit/options/base_options.py

    def parse(self):
        args = self.parser.parse_args()
-        print(args)
+        # print(args)


Why commenting out? It's okay if we printing the same info somewhere else, but should keep it if not.

ntomita · 2024-02-18T21:19:11Z

MaskHIT/maskhit/quick_test.py

    pattern = r'--timestr=[^\s]+'
    org_cmd = re.sub(pattern, '', org_cmd)

-    timestr_new += '-test'


This is actually a helpful postfix to distinguish one from training so please revert this.
timestr_new variable sounds terrible. Could you rename it to timestr_test ?

When I tested the original version it was appending '-test' a new time for fold so that by the last fold the timestr was '2023_2_6-vit-test-test-test-test-test' for example. To fix I removed that code and instead append "-test" to the timestr passed from cross_validation.py. Let me know if you'd still recommend changes there, I'll add the rename to timestr_test now though.

ntomita · 2024-02-18T21:31:54Z

MaskHIT/maskhit/train.py

+                patient_ids.append(patient_id) # adding patient id to the list
+            meta_split['id_patient'] = patient_ids # adding column to the meta_split dataframe
+            # formatting rows in meta_file of the id patients so they match that of meta_split df
+            meta_file['id_patient'] = meta_file['id_patient'].apply(lambda x: pd.Series(x.split(' ')[0]))


Changes for 272-307 seems making sense for non-ibd users. Then next step is whoever affected by this change should correctly update their meta data files in SlidePrep library part. The assumption here seems both meta files has id_patient column with consistent values so it can be merged at 318. Could you add this note after the line 272 (if-branch) so people know what should be fixed in case this change breaks some non-ibd users' code.

ntomita · 2024-02-18T21:34:58Z

MaskHIT/maskhit/train.py

+            # use the model name
+            model_name = config.model.resume
+            TIMESTR = model_name.split('-')[0]
+    elif config.model.resume:


Shouldn't we stick to one instead of making compatible for both if both args are meant to be the same thing?

I ran into a problem with that setup when using cross_validation.py. My dataset config file doesn't resume a pretrained model but I then need to load a different model for each fold when testing. Since the config file doesn't change between test folds, I had to specify which model to evaluate (ie resume) in the form of args to train.py. This seemed like the best way to fix the issue when running cross validation while also allowing for fine tuning a pretrained model specified in config, let me know if have an idea for something better.

ntomita · 2024-02-18T21:36:32Z

MaskHIT/maskhit/train.py

    data_dict = {"train": df_train, "val": df_test}

-    df_test.to_csv('fold0.csv')
-


I wasn't sure about this fold0 too so I'm okay to remove this. Based on file name it's probably for debugging in the early stage of development.

Yes, I added that for debugging with the IBD Project, so it is safe to be removed

ntomita · 2024-02-18T21:38:21Z

MaskHIT/maskhit/utils/config.py

+        # print(f"[INFO] Loading config files:")
+        # print(f"[INFO]    Default config: {default_config_file}")
+        # print(f"[INFO]    User config: {user_config_file}")



I am not sure why you have to move this to crossvalidation script? Functionally here seems better place.

Sure, I'll switch it back

ntomita · 2024-02-18T21:39:44Z

MaskHIT/maskhit/trainer/fitter.py

+            self.writer['meta'].info('\t'+ data_dict['val'][['id_patient']].to_string()) 
+        elif procedure == 'test':
+            self.writer['meta'].info('Testing patients:')
+            self.writer['meta'].info('\t'+ data_dict['val'][['id_patient']].to_string()) 


Could you explain why these lines have to be added?

I thought that it would be useful to have a record in the log file of which slides are being trained/validated/tested on. That's useful for me but I can also remove it from the PR if other users would rather not include that info in the log files.

jack-mcmahon and others added 4 commits February 7, 2024 18:16

fix: cleaner logging format

a897bae

fix: cross validation bugs

a4d1619

fix: cross_validation.py bug

efd55ed

fix: bugs in quick_test.py

eb84d9f

ntomita requested review from adas2125 and ntomita February 8, 2024 18:35

ntomita requested changes Feb 18, 2024

View reviewed changes

jack-mcmahon added 6 commits February 18, 2024 17:45

fix: move config.py printout

257f157

fix: move config.py printout

e0611a4

fix: rename timestr_test

edf61f2

Update cross_validation.py

fcf68df

Update base_options.py

a5dff95

Update train.py

54820fd

jack-mcmahon requested a review from ntomita February 18, 2024 22:50

adas2125 approved these changes Feb 18, 2024

View reviewed changes

		data_dict = {"train": df_train, "val": df_test}

		df_test.to_csv('fold0.csv')

Bugfix/package config edgecases #8

Are you sure you want to change the base?

Bugfix/package config edgecases #8

Uh oh!

Conversation

jack-mcmahon commented Feb 8, 2024

Uh oh!

ntomita left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jack-mcmahon Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jack-mcmahon Feb 18, 2024 •

edited

Loading