Confusion about the standard face to be aligned

https://github.com/VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch/blob/40209e09c49553c00c25c7d41faa3706aea3c625/scripts/extract_lip.py#L91

Why set the standard face by this way (including the parameters of affine transformation)? Could you give more specific insights?