-
-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Hello,
I am using the pretrained 'all' models for evaluation on the MIMIC-CXR-JPG.
My interest is in the following 5 classes:
Atelectasis, Cardiomegaly, Consolidation, Lung Opacity, Pleural Effusion
According to the documentation, the models should already be calibrated for MIMIC-CXR. However, when I classify the test set using threshold 0.5 to binarize, I obtain a very large number of positives compared to the ground truth labels.
For istance, this is the difference in the small set i'm testing on:
Class | True positives | Predicted positives
Atelectasis | 678 | 1504
Cardiomegaly | 290 | 1385
Consolidation | 552 | 1165
Lung Opacity | 1156 | 1488
Pleural Effusion | 354 | 1288
This is the code i'm using:
for batch in tqdm(val_loader):
img_path = batch["xray_path"][0]
img = xrv.utils.load_image(img_path)
transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop()])
img = transform(img)
output = {}
with torch.no_grad():
img = torch.from_numpy(img).unsqueeze(0)
preds = xrv_clf(img.to('cuda')).cpu()Am I misunderstanding something in the evaluation logic?
Should the pretrained models in torchxrayvision be considered as uncalibrated probability outputs, meaning that threshold 0.5 is not appropriate?
Or is there something else specific to MIMIC-CXR-JPG (label noise, distribution shift, etc.) that explains the discrepancy?
Thanks a lot for your help and for this very useful library!