The call to locals() makes vg.shape.check() much slower than it needs to be in the (very likely) success case. As an optimization, as a convenience to the caller, we could modify it to fetch locals from the stack rather than requiring the caller to pass them in.
See this comment thread which explains the approach: lace/blmath#17 (comment)