There is a sigmoid activation on gamma and beta in the FiLM layers. This makes the affine transformation only able to shift in the positive direction and the scaling becomes very limited. In the paper they actually tested trying different activations on the affine transformation variables and they all hurt performance. If you just leave the output as is without any activation you should see significant improvement.