Skip to content

High number of false positives on MIMIC-CXR-JPG (threshold 0.5) #175

@danielemolino

Description

@danielemolino

Hello,
I am using the pretrained 'all' models for evaluation on the MIMIC-CXR-JPG.

My interest is in the following 5 classes:

Atelectasis, Cardiomegaly, Consolidation, Lung Opacity, Pleural Effusion

According to the documentation, the models should already be calibrated for MIMIC-CXR. However, when I classify the test set using threshold 0.5 to binarize, I obtain a very large number of positives compared to the ground truth labels.
For istance, this is the difference in the small set i'm testing on:

Class | True positives | Predicted positives
Atelectasis | 678 | 1504
Cardiomegaly | 290 | 1385
Consolidation | 552 | 1165
Lung Opacity | 1156 | 1488
Pleural Effusion | 354 | 1288

This is the code i'm using:

for batch in tqdm(val_loader):
        img_path = batch["xray_path"][0]
        img = xrv.utils.load_image(img_path)
        transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop()])

        img = transform(img)

        output = {}
        with torch.no_grad():
            img = torch.from_numpy(img).unsqueeze(0)
            preds = xrv_clf(img.to('cuda')).cpu()

Am I misunderstanding something in the evaluation logic?

Should the pretrained models in torchxrayvision be considered as uncalibrated probability outputs, meaning that threshold 0.5 is not appropriate?

Or is there something else specific to MIMIC-CXR-JPG (label noise, distribution shift, etc.) that explains the discrepancy?

Thanks a lot for your help and for this very useful library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions