High number of false positives on MIMIC-CXR-JPG (threshold 0.5)

Hello,
I am using the pretrained 'all' models for evaluation on the MIMIC-CXR-JPG.

My interest is in the following 5 classes:

Atelectasis, Cardiomegaly, Consolidation, Lung Opacity, Pleural Effusion

According to the documentation, the models should already be calibrated for MIMIC-CXR. However, when I classify the test set using threshold 0.5 to binarize, I obtain a very large number of positives compared to the ground truth labels.
For istance, this is the difference in the small set i'm testing on:

Class | True positives | Predicted positives
Atelectasis | 678 | 1504
Cardiomegaly | 290 | 1385
Consolidation | 552 | 1165
Lung Opacity | 1156 | 1488
Pleural Effusion | 354 | 1288

This is the code i'm using:

```python
for batch in tqdm(val_loader):
        img_path = batch["xray_path"][0]
        img = xrv.utils.load_image(img_path)
        transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop()])

        img = transform(img)

        output = {}
        with torch.no_grad():
            img = torch.from_numpy(img).unsqueeze(0)
            preds = xrv_clf(img.to('cuda')).cpu()

```

Am I misunderstanding something in the evaluation logic?

Should the pretrained models in torchxrayvision be considered as uncalibrated probability outputs, meaning that threshold 0.5 is not appropriate?

Or is there something else specific to MIMIC-CXR-JPG (label noise, distribution shift, etc.) that explains the discrepancy?

Thanks a lot for your help and for this very useful library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High number of false positives on MIMIC-CXR-JPG (threshold 0.5) #175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

High number of false positives on MIMIC-CXR-JPG (threshold 0.5) #175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions