I’m using the VBR Rome dataset for visual localization by forming image pairs within the same scene.
For the campus scene, I attempted to align the ground truth poses of the training sequences into a common frame to generate these pairs. To do this, I applied ICP to align LiDAR point clouds from nearby poses—selecting one sample from train1 and another from train0.
The alignment worked well in the x and y dimensions (see Image 1). However, the z dimension of the ground truth poses exhibits an unusual trend that is not really physically sensible. As shown in Image 2, the z variation itself appears significantly off.
