VLG-Loc

This is the official repository for the following paper:

Mizuho Aoki*, Kohei Honda, Yasuhiro Yoshimura, Takeshi Ishita, Ryo Yonetani, "VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps", arxiv, 2025, paper | project page | dataset | video

Vision-Language Global Localization (VLG-Loc) is a global localization method that uses camera images and a human-readable labeled footprint map containing only names and areas of distinctive visual landmarks.

Setup

This setup provides a GPU-enabled Ubuntu 22.04 environment using Docker.

Prerequisites
- git
  - For ubuntu users:
```
sudo apt install git
```
- git-lfs
  - For ubuntu users:
```
sudo apt install git-lfs
git lfs install
```
- docker
  - For ubuntu users:
```
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo groupadd docker
sudo usermod -aG docker $USER
reboot
```
- make
  - For ubuntu users:
```
sudo apt install make
```
- NVIDIA Container Toolkit
  - This is required to allow Docker containers to access the host's GPU.
- NVIDIA GPU & Driver
  - An NVIDIA GPU and a compatible driver for the base image (nvidia/cuda:12.4.1-devel-ubuntu22.04) are required.
- Azure OpenAI API KEY

Clone the repository

git clone [email protected]:CyberAgentAILab/VLG-Loc.git

Download the dataset to the project root on your host machine. The dataset directory will be mounted as ~/dev_ws/dataset in the container.
```
cd VLG-Loc
mkdir -p dataset
cd dataset
git clone https://huggingface.co/datasets/cyberagent/VLG-Loc-Dataset vlg_loc_dataset
```
Build the docker container.
```
cd VLG-Loc
make setup_docker
```
Get inside the docker container.
```
cd VLG-Loc
make launch_docker
```
Set the VLM API key and endpoint as environment variables.
Copy the example configuration file .env.example to create your own .env file:
```
cd ~/dev_ws
cp .env.example .env
```
Open .env and replace <vlm_api_key> and <vlm_api_endpoint> with your actual credentials.
Setup the workspace.
```
cd ~/dev_ws
source setup_workspace.sh
```

Evaluation

Use the following command to run an evaluation. Replace <ENV_NAME> and <CONFIG_PATH> with the appropriate values from the sections below.

python3 scripts/run_eval.py \
    --dataset-root dataset \
    --dataset_name vlg_loc_dataset/<ENV_NAME> \
    --mode <EVAL_MODE> \
    --config_filename=<CONFIG_PATH> \
    --overwrite
    --create_video

Descriptions of Arguments:

--dataset-root: Path to the root directory of the dataset.
--dataset_name: Name of the dataset.
--mode: Evaluation mode.
--config_filename: Path to the configuration file of the localizer.
--overwrite: If specified, existing results will be overwritten.
--create_video: If specified, a video summarizing the evaluation will be created. Be aware that this takes longer time and requires more disk space.

Evaluation Modes (<EVAL_MODE>):

eval_scan_localizer: Evaluate the scan localizer.
eval_vision_localizer: Evaluate the vision localizer.
eval_vision_and_scan_localizer: Evaluate multimodal localization using both vision and scan data.
clean_all_logs: Clean evaluation outputs.

Environments (<ENV_NAME> and <CONFIG_PATH>):

Select the <ENV_NAME> and <CONFIG_PATH> for your desired environment from the table below.

Environment	<ENV_NAME>	<CONFIG_PATH>
UG/UA (Uniform Geometry, Uniform Appearance)	env_ug_ua	env_ug_ua/loc_eval_env_ug_ua.yaml
UG/DA (Uniform Geometry, Diverse Appearance)	env_ug_da	env_ug_da/loc_eval_env_ug_da.yaml
DG/UA (Diverse Geometry, Uniform Appearance)	env_dg_ua	env_dg_ua/loc_eval_env_dg_ua.yaml
DG/DA (Diverse Geometry, Diverse Appearance)	env_dg_da	env_dg_da/loc_eval_env_dg_da.yaml
Retail Store (Real)	env_retail_store_real	retail_store_real/loc_eval_env_retail_store_real.yaml
Retail Store (Sim)	env_retail_store_sim	env_retail_store_sim/loc_eval_env_retail_store_sim.yaml

Note

The included configuration file is set to use gpt-4.1 as default.
If you change model, please specify [vlm_config][model_name] in configuration file.

Example Command:

To run the vision localizer evaluation in the DG/DA environment, use the following command:

python3 scripts/run_eval.py --dataset-root dataset --dataset_name vlg_loc_dataset/env_dg_da --mode eval_vision_localizer --config_filename=env_dg_da/loc_eval_env_dg_da.yaml --overwrite

Note

While we strive for reproducible outputs by using seed and temperature parameters, please be aware that minor variations in LLM outputs can lead to slight differences in localization results.

Visualize the Results

After running the evaluation, you can visualize the results using the web visualizer. Use the following command to start the visualizer:

python3 scripts/web_visualizer.py --target_dir <PATH_TO_DATASET_DIR>

Replace <PATH_TO_DATASET_DIR> with the path to the dataset directory (e.g., dataset/vlg_loc_dataset/env_dg_da).

Then, open a web browser and navigate to http://127.0.0.1:8080 to view the visualizer.

Dataset Generation in the Simulation Environments

You can generate datasets by running simulations in the provided Gazebo Classic environments. This requires running commands in separate terminals inside the Docker container.

Build the workspace.
```
cd ~/dev_ws
make build
```
Terminal 1: Launch Gazebo World In your first Docker terminal, launch the desired simulation environment.
```
cd ~/dev_ws
source ~/.bashrc
ros2 launch mobile_robot_ros2 vmegarover_world.launch.py world_fname:=<ENV_NAME>
```
- Note: Replace <ENV_NAME> with one of the environment names listed in the table above except for env_retail_store_real.
Terminal 2: Launch Manual Controller

In a separate terminal on the docker container, run the joypad controller to operate the robot. For more information, please refer to the joy_controller documentation.
```
cd ~/dev_ws
source ~/.bashrc
ros2 launch joy_controller joy_controller_launch.py
```
Alternatively, you can operate robot with teleop_twist_keyboard by running the following command:
```
ros2 run teleop_twist_keyboard teleop_twist_keyboard
```
Terminal 3: Launch Dataset Maker

In another separate terminal on the docker container, run the dataset maker to record data while operating the robot.
The screenshot below shows RViz (left) and the Gazebo Simulator (right) while data is being recorded.
In the RViz visualization:
- The red point cloud shows the current, live sensor scan.
- The blue point cloud and the three images represent the most recently saved dataset.
- The blue arrows indicate the sequence of ground truth positions for that saved dataset.
After you finish recording, you can stop the process by pressing CTRL + C on the terminal.
```
cd ~/dev_ws
source ~/.bashrc
export HYDRA_CONFIG_PATH=<ENV_NAME>/loc_eval_<ENV_NAME>
ros2 launch launch/make_dataset.launch.py map_file_path:=<ENV_NAME>/occupancy_grid_map/<ENV_NAME>.yaml
```
Visualize Recorded Data

After the dataset is recorded, the data will be saved in the results directory. You can visualize the recorded data using the web visualizer.
```
cd ~/dev_ws
python3 scripts/web_visualizer.py --target_dir results
```

License and Acknowledgements

This project is licensed under the Apache-2.0 License. Please note that certain components may be distributed under different licenses; refer to the corresponding directories for detailed information.

This project makes use of several open-source libraries. We would like to express our gratitude to the developers and contributors of these projects.

kachaka_ros2_dev_kit
- License: Apache-2.0
- Included in: src/joy_controller
emcl2
- License: LGPL-3.0
- Included in: src/emcl2_ros2
AWS RoboMaker World Assets
- Assets
- License: MIT-0
- Included in: src/mobile_robot_ros2/models, src/mobile_robot_ros2/photos
openai-python
- License: Apache-2.0
- Included in: lib/vlg_loc/recognition/azure.py
- The code used to call the GPT model was adapted from this repository.
gazebo_map_creator
- License: Apache-2.0
- While this repository does not include the tool’s source code, the occupancy grid maps used for the simulation worlds were generated with this tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLG-Loc

Setup

Evaluation

License and Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
docker		docker
launch		launch
lib/vlg_loc		lib/vlg_loc
media		media
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup_workspace.sh		setup_workspace.sh

License

CyberAgentAILab/VLG-Loc

Folders and files

Latest commit

History

Repository files navigation

VLG-Loc

Setup

Evaluation

License and Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages