Paper 里提到 “For numerical stability, we trained the network to predict $\log{\sigma^2}$ instead of $sigmoid$”。参考引用的文章 《Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics》中提到 “In practice, we train the network to predict the log variance, $s := log\sigmoid^2$. This is because it is more numerically stable than regressing the variance, $\sigmoid^2$, as the loss avoids any division by zero. The exponential mapping also allows us to regress unconstrained scalar values, where exp(−s) is resolved to the positive domain giving valid values for variance.",
我的理解是直接学习 $\sigmoid^2$ 可能会出现为 0 的情况,因此换成学习 $\log\sigmoid^2$。但是从代码看,似乎还是直接优化的 $\sigmoid^2$ ,优化过程不会出现除零错误么?