You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this approach has a major issue: it is **sensitive to prediction bias** and can lead to **overconfidence**. In other words, if a model generates very high logits for a class (indicating strong confidence in its prediction), but that prediction is incorrect, it can skew the results. This phenomenon is largely due to the **exponential function** in the softmax formula, which amplifies the differences between logits. This can lead to significant errors, especially when the model is overly confident without being accurate.
@@ -135,7 +135,7 @@ To address this challenge, the paper introduces **MANO**, a novel method that le
135
135
## **Introducing MANO: A Two-Step Approach** {#section-1}
136
136
MANO addresses these challenges through a two-step process: **Normalization with Softrun** and **Aggregation using Matrix Norms**. Here is a scheme so you can visualize the process :
### **1. Normalization with Softrun** {#section-1.1}
141
141
As explained before, Softmax is a very common activation function to transform logits into probabilities. But its exponential nature exaggerates differences between logits, making the model appear more confident than it actually is.
When the model’s predictions are unreliable, Softrun applies a Taylor approximation rather than the softmax. The Taylor approximation smooths out the effect of large logits, preventing the model from being overly confident in any particular prediction. By contrast, when the dataset is well-calibrated, the function behaves like softmax, preserving probability distributions where confidence is warranted.
@@ -217,7 +217,7 @@ Let's see now how to implement MANO in practice!
217
217
218
218
Before diving into implementation, it’s important to understand the logic behind the MANO algorithm for unsupervised accuracy estimation.
219
219
220
-

220
+

221
221
222
222
The pseudocode above outlines the core procedure: given a model and an unlabeled test set, the method first determines the best way to normalize the model's logits, either using softmax or the novel alternative softrun on an entropy-based criterion (see [Section 1.1](#section-1.1)). Then, it iterates over each sample in the test set, collects the normalized predictions into a matrix, and finally computes an estimation score using the matrix’s normalized L_p norm (see [Section 1.2](#section-1.2)). This score correlates with the model's true accuracy, even without access to ground truth labels.
223
223
@@ -271,7 +271,7 @@ MANO has been evaluated against **11 baseline methods**, including Rotation Pred
271
271
In this comprehensive evaluation, the authors have considered 3 types of distribution shifts: **synthetic shifts**, where models were tested against artificially corrupted images; **natural shifts**, which involved datasets collected from different distributions than the training data; and **subpopulation shifts**, where certain classes or groups were underrepresented in the training data. To evaluate Mano under synthetic shifts, the authors have used CIFAR-10C, CIFAR-100C, ImageNet-C, and TinyImageNet-C, covering various corruption types and severity levels. For natural shifts, they tested on OOD datasets from PACS, Office-Home, DomainNet, and RR1 WILDS. To assess subpopulation shifts, they used the BREEDS benchmark, including Living-17, Nonliving-26, Entity-13, and Entity-30 from ImageNet-C.
<!-- <p style="text-align: center;">$R^2$ distribution ResNet18 on all distribution shifts </p> -->
276
276
</figure>
277
277
@@ -280,7 +280,7 @@ On the left, we can see a box plot of $R^2$ distribution showing the estimation
280
280
Additionally, in the figure below, we can see a scatter plot illustrating the outperforming results of Mano on natural shift compared to Dispersion Score and ProjNorm on Entity-18 using ResNet-18.
0 commit comments