🌏 Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks
Wilson Wongso1,2, Hao Xue1,2, and Flora Salim1,2
1 School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
2 ARC Centre of Excellence for Automated Decision Making + Society
Massive-STEPS is a large-scale dataset of semantic trajectories intended for understanding POI check-ins. The dataset is derived from the Semantic Trails Dataset and Foursquare Open Source Places, and includes check-in data from 15 cities across 10 countries. The dataset is designed to facilitate research in various domains, including trajectory prediction, POI recommendation, and urban modeling. Massive-STEPS emphasizes the importance of geographical diversity, scale, semantic richness, and reproducibility in trajectory datasets.
Massive-STEPS is available on 🤗 Datasets. You can download the preprocessed dataset of each city using the following links:
| City | Users | Trails | POIs | Check-ins | #train | #val | #test | URL |
|---|---|---|---|---|---|---|---|---|
| Bandung 🇮🇩 | 3,377 | 55,333 | 29,026 | 161,284 | 113,058 | 16,018 | 32,208 | 🤗 |
| Beijing 🇨🇳 | 56 | 573 | 1,127 | 1,470 | 400 | 58 | 115 | 🤗 |
| Istanbul 🇹🇷 | 23,700 | 216,411 | 53,812 | 544,471 | 151,487 | 21,641 | 43,283 | 🤗 |
| Jakarta 🇮🇩 | 8,336 | 137,396 | 76,116 | 412,100 | 96,176 | 13,740 | 27,480 | 🤗 |
| Kuwait City 🇰🇼 | 9,628 | 91,658 | 17,180 | 232,706 | 64,160 | 9,166 | 18,332 | 🤗 |
| Melbourne 🇦🇺 | 646 | 7,864 | 7,699 | 22,050 | 5,504 | 787 | 1,573 | 🤗 |
| Moscow 🇷🇺 | 3,993 | 39,485 | 17,822 | 105,620 | 27,639 | 3,949 | 7,897 | 🤗 |
| New York 🇺🇸 | 6,929 | 92,041 | 49,218 | 272,368 | 64,428 | 9,204 | 18,409 | 🤗 |
| Palembang 🇮🇩 | 267 | 4,699 | 4,343 | 14,467 | 10,132 | 1,487 | 2,848 | 🤗 |
| Petaling Jaya 🇲🇾 | 14,308 | 180,410 | 60,158 | 506,430 | 126,287 | 18,041 | 36,082 | 🤗 |
| São Paulo 🇧🇷 | 5,822 | 89,689 | 38,377 | 256,824 | 62,782 | 8,969 | 17,938 | 🤗 |
| Shanghai 🇨🇳 | 296 | 3,636 | 4,462 | 10,491 | 2,544 | 364 | 728 | 🤗 |
| Sydney 🇦🇺 | 740 | 10,148 | 8,986 | 29,900 | 7,103 | 1,015 | 2,030 | 🤗 |
| Tangerang 🇮🇩 | 1,437 | 15,984 | 12,956 | 45,521 | 32,085 | 4,499 | 8,937 | 🤗 |
| Tokyo 🇯🇵 | 764 | 5,482 | 4,725 | 13,839 | 3,836 | 549 | 1,097 | 🤗 |
To reproduce the Massive-STEPS dataset, follow these steps:
- Clone the Semantic Trails repository.
- This provides the necessary metadata, such as
cities.csvandcategories.csvthat contains the mapping of POI categories and metadata about each city.
- This provides the necessary metadata, such as
- Download Semantic Trails dataset from Figshare and extract it to
semantic-trails/downloads/.- This provides
std_2013.csvandstd_2018.csvthat contains the check-in data for 2013 and 2018, respectively. We will be using both subsets to create the Massive-STEPS dataset.
- This provides
- Download the city boundaries in GeoJSON format.
- First, do a lookup of the target city's relation ID from OpenStreetMap.
- For example, search for "Beijing" and find the relation ID
912940.
- For example, search for "Beijing" and find the relation ID
- Then, using the related ID, download the geographical boundaries of the target city in GeoJSON file format from Overpass Turbo.
- We recommend saving the GeoJSON file in the
data/<city_name>directory.
- We recommend saving the GeoJSON file in the
- First, do a lookup of the target city's relation ID from OpenStreetMap.
- Preprocess the check-in data using the
src/preprocess_std.pyscript.- This script will filter the check-in data based on the geographical boundaries of the target city and create a CSV file containing the check-in data.
- The script also filters out trajectories with less than 2 check-ins and users with less than 3 trajectories.
- Create the next POI dataset using the
src/create_next_poi_dataset.pyscript.- This script will create a dataset for the next POI recommendation task based on the filtered check-in data:
- Each trajectory will be treated as a sequence of POIs and the last POI will be the target next POI.
- Trajectories will also be converted into textual prompts for convenience.
- The dataset will be split into training, validation, and test sets based on the user IDs.
- The dataset will also be uploaded to Hugging Face Datasets with the specified dataset ID.
- This script will create a dataset for the next POI recommendation task based on the filtered check-in data:
Bandung 🇮🇩
[out:json];
relation(13290062);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/bandung/bandung_13290062_overpass.geojson \
--output_dir data/bandung \
--output_file bandung_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/bandung/bandung_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-BandungBeijing 🇨🇳
[out:json];
relation(912940);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/beijing/beijing_912940_overpass.geojson \
--output_dir data/beijing \
--output_file beijing_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/beijing/beijing_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-BeijingIstanbul 🇹🇷
[out:json];
relation(223474);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/istanbul/istanbul_223474_overpass.geojson \
--output_dir data/istanbul \
--output_file istanbul_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/istanbul/istanbul_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-IstanbulJakarta 🇮🇩
[out:json];
relation(6362934);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/jakarta/jakarta_6362934_overpass.geojson \
--output_dir data/jakarta \
--output_file jakarta_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/jakarta/jakarta_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-JakartaKuwait City 🇰🇼
[out:json];
relation(305099);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/kuwait_city/kuwait_city_305099_overpass.geojson \
--output_dir data/kuwait_city \
--output_file kuwait_city_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/kuwait_city/kuwait_city_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-Kuwait-CityMelbourne 🇦🇺
[out:json];
relation(4246124);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/melbourne/melbourne_4246124_overpass.geojson \
--output_dir data/melbourne \
--output_file melbourne_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/melbourne/melbourne_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-MelbourneMoscow 🇷🇺
[out:json];
relation(2555133);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/moscow/moscow_2555133_overpass.geojson \
--output_dir data/moscow \
--output_file moscow_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/moscow/moscow_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-MoscowNew York 🇺🇸
[out:json];
relation(175905);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/new_york/new_york_175905_overpass.geojson \
--output_dir data/new_york \
--output_file new_york_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/new_york/new_york_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-New-YorkPetaling Jaya 🇲🇾
[out:json];
relation(8347386);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/petaling_jaya/petaling_jaya_8347386_overpass.geojson \
--output_dir data/petaling_jaya \
--output_file petaling_jaya_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/petaling_jaya/petaling_jaya_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-Petaling-JayaPalembang 🇮🇩
[out:json];
relation(10713145);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/palembang/palembang_10713145_overpass.geojson \
--output_dir data/palembang \
--output_file palembang_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/palembang/palembang_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-PalembangSão Paulo 🇧🇷
[out:json];
relation(298285);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/sao_paulo/sao_paulo_298285_overpass.geojson \
--output_dir data/sao_paulo \
--output_file sao_paulo_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/sao_paulo/sao_paulo_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-Sao-PauloShanghai 🇨🇳
[out:json];
relation(913067);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/shanghai/shanghai_913067_overpass.geojson \
--output_dir data/shanghai \
--output_file shanghai_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/shanghai/shanghai_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-ShanghaiSydney 🇦🇺
[out:json];
relation(5750005);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/sydney/sydney_5750005_overpass.geojson \
--output_dir data/sydney \
--output_file sydney_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/sydney/sydney_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-SydneyTangerang 🇮🇩
[out:json];
relation(7641583);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/tangerang/tangerang_7641583_overpass.geojson \
--output_dir data/tangerang \
--output_file tangerang_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/tangerang/tangerang_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-TangerangTokyo 🇯🇵
[out:json];
relation(1543125);
out geom;python src/preprocess_std.py \
--std_2013_file semantic-trails/downloads/std_2013.csv \
--std_2018_file semantic-trails/downloads/std_2018.csv \
--cities_file semantic-trails/cities.csv \
--categories_file semantic-trails/categories.csv \
--city_geo_json_file data/tokyo/tokyo_1543125_overpass.geojson \
--output_dir data/tokyo \
--output_file tokyo_checkins.csv \
--min_checkins 2 --min_trails 3python src/create_next_poi_dataset.py \
--checkins_file data/tokyo/tokyo_checkins.csv \
--dataset_id CRUISEResearchGroup/Massive-STEPS-TokyoWe also conducted extensive benchmarks on the Massive-STEPS dataset using various models for POI recommendation. The following table summarizes the results of our experiments, reported in Acc@1:
| Model | Bandung | Beijing | Istanbul | Jakarta | Kuwait City | Melbourne | Moscow | New York | Palembang | Petaling Jaya | São Paulo | Shanghai | Sydney | Tangerang | Tokyo |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FPMC | 0.048 | 0.000 | 0.026 | 0.029 | 0.021 | 0.062 | 0.059 | 0.032 | 0.102 | 0.026 | 0.030 | 0.084 | 0.075 | 0.104 | 0.176 |
| RNN | 0.062 | 0.085 | 0.077 | 0.049 | 0.087 | 0.059 | 0.075 | 0.061 | 0.049 | 0.064 | 0.097 | 0.055 | 0.080 | 0.087 | 0.133 |
| LSTPM | 0.110 | 0.127 | 0.142 | 0.099 | 0.180 | 0.091 | 0.151 | 0.099 | 0.114 | 0.099 | 0.158 | 0.099 | 0.141 | 0.154 | 0.225 |
| DeepMove | 0.107 | 0.106 | 0.150 | 0.103 | 0.179 | 0.083 | 0.143 | 0.097 | 0.084 | 0.112 | 0.160 | 0.085 | 0.129 | 0.145 | 0.201 |
| GETNext | 0.179 | 0.433 | 0.146 | 0.155 | 0.175 | 0.100 | 0.175 | 0.134 | 0.158 | 0.139 | 0.202 | 0.115 | 0.181 | 0.224 | 0.180 |
| STHGCN | 0.219 | 0.453 | 0.241 | 0.197 | 0.225 | 0.168 | 0.223 | 0.146 | 0.246 | 0.174 | 0.250 | 0.193 | 0.227 | 0.293 | 0.250 |
| UniMove | 0.007 | 0.036 | 0.015 | 0.004 | 0.023 | 0.008 | 0.009 | 0.004 | 0.009 | 0.008 | 0.002 | 0.000 | 0.015 | 0.001 | 0.032 |
We also conducted zero-shot POI recommendation experiments on the Massive-STEPS dataset. The following table summarizes the results of our experiments, reported in Acc@1:
| Method | Model | Bandung | Beijing | Istanbul | Jakarta | Kuwait City | Melbourne | Moscow | New York | Palembang | Petaling Jaya | São Paulo | Shanghai | Sydney | Tangerang | Tokyo |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LLM-Mob | gemini-2.0-flash | 0.105 | 0.115 | 0.080 | 0.100 | 0.095 | 0.060 | 0.130 | 0.095 | 0.135 | 0.090 | 0.130 | 0.055 | 0.060 | 0.155 | 0.140 |
| Qwen2.5-7B-Instruct-AWQ | 0.060 | 0.058 | 0.035 | 0.105 | 0.080 | 0.030 | 0.090 | 0.070 | 0.075 | 0.030 | 0.090 | 0.040 | 0.035 | 0.095 | 0.110 | |
| Meta-Llama-3.1-8B-Instruct-AWQ-INT4 | 0.010 | 0.000 | 0.020 | 0.055 | 0.030 | 0.010 | 0.030 | 0.025 | 0.005 | 0.010 | 0.030 | 0.005 | 0.020 | 0.020 | 0.005 | |
| gemma-2-9b-it-AWQ-INT4 | 0.070 | 0.115 | 0.075 | 0.105 | 0.080 | 0.055 | 0.100 | 0.070 | 0.095 | 0.055 | 0.085 | 0.050 | 0.030 | 0.145 | 0.145 | |
| LLM-ZS | gemini-2.0-flash | 0.095 | 0.058 | 0.090 | 0.110 | 0.080 | 0.065 | 0.125 | 0.080 | 0.130 | 0.110 | 0.150 | 0.065 | 0.060 | 0.145 | 0.160 |
| Qwen2.5-7B-Instruct-AWQ | 0.055 | 0.038 | 0.040 | 0.065 | 0.050 | 0.040 | 0.080 | 0.050 | 0.050 | 0.045 | 0.095 | 0.045 | 0.045 | 0.100 | 0.120 | |
| Meta-Llama-3.1-8B-Instruct-AWQ-INT4 | 0.045 | 0.077 | 0.040 | 0.045 | 0.060 | 0.040 | 0.080 | 0.055 | 0.070 | 0.030 | 0.030 | 0.060 | 0.040 | 0.080 | 0.110 | |
| gemma-2-9b-it-AWQ-INT4 | 0.065 | 0.096 | 0.045 | 0.105 | 0.070 | 0.050 | 0.080 | 0.075 | 0.060 | 0.065 | 0.075 | 0.050 | 0.045 | 0.100 | 0.110 | |
| LLM-Move | gemini-2.0-flash | 0.225 | 0.096 | 0.205 | 0.295 | 0.220 | 0.225 | 0.220 | 0.235 | 0.260 | 0.210 | 0.285 | 0.170 | 0.230 | 0.200 | 0.250 |
| Qwen2.5-7B-Instruct-AWQ | 0.100 | 0.192 | 0.175 | 0.115 | 0.160 | 0.110 | 0.230 | 0.120 | 0.130 | 0.135 | 0.155 | 0.095 | 0.125 | 0.175 | 0.250 | |
| Meta-Llama-3.1-8B-Instruct-AWQ-INT4 | 0.030 | 0.058 | 0.015 | 0.015 | 0.010 | 0.040 | 0.005 | 0.035 | 0.010 | 0.040 | 0.045 | 0.020 | 0.055 | 0.000 | 0.030 | |
| gemma-2-9b-it-AWQ-INT4 | 0.175 | 0.096 | 0.100 | 0.235 | 0.120 | 0.115 | 0.110 | 0.115 | 0.210 | 0.175 | 0.195 | 0.105 | 0.125 | 0.125 | 0.130 |
We conducted spatiotemporal classification and reasoning experiments on the Massive-STEPS dataset. Specifically, we evaluated the zero-shot performance of LLMs to classify whether a check-in trajectory ended at a weekend (Saturday or Sunday) or a weekday (Monday to Friday). The following table summarizes the results of our experiments, reported in accuracy:
| Model | Bandung | Beijing | Istanbul | Jakarta | Kuwait City | Melbourne | Moscow | New York | Palembang | Petaling Jaya | São Paulo | Shanghai | Sydney | Tangerang | Tokyo |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gemini-2.0-flash | 0.635 | 0.615 | 0.715 | 0.650 | 0.765 | 0.635 | 0.740 | 0.620 | 0.670 | 0.610 | 0.730 | 0.600 | 0.550 | 0.635 | 0.510 |
| gpt-4o-mini | 0.625 | 0.538 | 0.610 | 0.610 | 0.430 | 0.635 | 0.745 | 0.600 | 0.645 | 0.590 | 0.645 | 0.565 | 0.545 | 0.600 | 0.495 |
| gpt-4.1-mini | 0.585 | 0.673 | 0.615 | 0.600 | 0.690 | 0.585 | 0.745 | 0.595 | 0.605 | 0.575 | 0.700 | 0.565 | 0.515 | 0.620 | 0.550 |
| gpt-5-nano | 0.570 | 0.635 | 0.535 | 0.530 | 0.470 | 0.500 | 0.635 | 0.580 | 0.560 | 0.565 | 0.680 | 0.465 | 0.440 | 0.520 | 0.580 |
Our experiments rely on the following libraries:
To reproduce our experiments, clone this repository and its submodules:
git clone https://github.com/CRUISEResearchGroup/Massive-STEPS
cd Massive-STEPS
git submodule updatewhere the submodules are our forks of the original repositories with some modifications to support Massive-STEPS: GETNext, STHGCN, LibCity and UniMove. Each of the submodules has its own updated dependencies, as listed in their respective requirements.txt files.
Because the submodules are not installed in the same directory as the main repository, we need to create a softlink for each submodule to point to the data/ directory. You can do this by running the following commands:
ln -s $(pwd)/data $(pwd)/GETNext/data
ln -s $(pwd)/data $(pwd)/Spatio-Temporal-Hypergraph-Model/data
ln -s $(pwd)/data $(pwd)/Bigscity-LibCity/dataOur experiments on FPMC, RNN, LSTPM, and DeepMove are based on the implementations of LibCity. To run the experiments, you can use the following command:
city=beijing # or istanbul, jakarta, etc.
python convert_std.py \
--city $city \
--path data \
--output_path raw_data
for model in FPMC RNN DeepMove; do
python run_model.py \
--task traj_loc_pred \
--model $model \
--dataset std_$city \
--config libcity/config/data/STDDataset
done
for model in LSTPM; do
python run_model.py \
--task traj_loc_pred \
--model $model \
--dataset std_$city \
--config libcity/config/data/STDDataset$model
doneThis will run the experiments for each model on the specified city. The results will be saved in the libcity/cache/{exp}/evaluate_cache/ directory.
You can refer to Bigscity-LibCity/run_train.sh for the full list of models and cities used in our experiments. The script will run the experiments for each model on all cities in the Massive-STEPS dataset.
To run the GETNext experiments, you can use the following command:
city=beijing # or istanbul, jakarta, etc.
python build_graph.py \
--csv_path data/$city/${city}_checkins_train.csv \
--output_dir data/$city
python train.py \
--data-train data/$city/${city}_checkins_train.csv \
--data-val data/$city/${city}_checkins_test.csv \
--data-adj-mtx data/$city/graph_A.csv \
--data-node-feats data/$city/graph_X.csv \
--time-units 48 --timestamp_column timestamp \
--poi-embed-dim 128 --user-embed-dim 128 \
--time-embed-dim 32 --cat-embed-dim 32 \
--node-attn-nhid 128 \
--transformer-nhid 1024 \
--transformer-nlayers 2 --transformer-nhead 2 \
--batch 16 --epochs 200 --name $city \
--workers 12 --exist-ok \
--lr 0.001This will run the GETNext experiments on the specified city. The results will be saved in the runs/train/{city}/ directory.
You can refer to GETNext/run_train.sh for the full list of cities used in our experiments. The script will run the GETNext experiments for each city in the Massive-STEPS dataset, with the hyperparameters we used in our experiments.
To run the STHGCN experiments, you can use the following command:
city=beijing # or istanbul, jakarta, etc.
python run.py -f std_conf/$city.ymlThis will run the STHGCN experiments on the specified city. The results will be saved in the log/{exp}/{city}/ directory.
You can refer to Spatio-Temporal-Hypergraph-Model/run_train.sh for the full list of cities used in our experiments.
To run the UniMove experiments, you can use the following command:
city=beijing # or istanbul, jakarta, etc.
python preprocess_massive_steps.py --city $city
python main.py \
--device cuda:0 \
--city $city \
--target_city $city \
--train_root traj_dataset/massive_steps/train \
--val_root traj_dataset/massive_steps/val \
--test_root traj_dataset/massive_steps/test \
--B 4This will train UniMove from scratch on the specified city. The results will be saved in the models/{city}/ directory.
You can refer to UniMove/run_train.sh for the full list of cities used in our experiments.
To run a zero-shot POI recommendation experiment, you can use the following command:
model=gemini-2.0-flash # or Qwen2.5-7B-Instruct-AWQ, etc.
prompt_typ=llmzs # or llmmob, llmmove
city=Beijing # Istanbul, Jakarta, etc.
city_key=$(echo "$city" | tr '[:upper:]' '[:lower:]' | tr '-' '_')
python src/run_next_poi_llm.py \
--dataset_name CRUISEResearchGroup/Massive-STEPS-$city \ # Hugging Face dataset ID
--num_users 200 --num_historical_stays 15 \ # number of test users and historical stays
--prompt_type $prompt_type \
--model_name $model \
--checkins_file data/$city_key/${city_key}_checkins.csv # path to the check-in dataThis will run the zero-shot POI recommendation experiment using the specified model and prompt type on the specified city. The results will be saved in the results/{dataset_name}/{model_name}/{prompt_type}/ directory.
You can refer to the run_next_poi_llm.sh script to see the full list of models and prompt types used in our experiments. The script will run the zero-shot POI recommendation experiment for each model and prompt type on all cities in the Massive-STEPS dataset.
To run a spatiotemporal classification and reasoning experiment, you can use the following command:
model=gemini-2.0-flash # or gpt-4o-mini, etc.
city=Beijing # Istanbul, Jakarta, etc.
city_key=$(echo "$city" | tr '[:upper:]' '[:lower:]' | tr '-' '_')
python src/run_classify_day_llm.py \
--dataset_name CRUISEResearchGroup/Massive-STEPS-$city \ # Hugging Face dataset ID
--num_users 200 \ # number of test users
--prompt_type st_day_classification \
--model_name $model \
--city $city \
--checkins_file data/$city_key/${city_key}_checkins.csv # path to the check-in dataTo host open-source LLMs, we recommend using vLLM which is a high-performance, memory-efficient, and easy-to-use library for serving large language models. Once installed, you can run the following command to start the server:
vllm serve Qwen/Qwen2.5-72B-Instruct-AWQ --quantization awqand modify VLLM_API_BASE_URL and VLLM_API_KEY environment variables accordingly, pointing to the server's address.
Our work is based on the following repositories:
If you find this repository useful for your research, please consider citing our paper:
@misc{wongso2025massivestepsmassivesemantictrajectories,
title = {Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks},
author = {Wilson Wongso and Hao Xue and Flora D. Salim},
year = {2025},
eprint = {2505.11239},
archiveprefix = {arXiv},
primaryclass = {cs.LG},
url = {https://arxiv.org/abs/2505.11239}
}If you have any questions or suggestions, feel free to contact Wilson at w.wongso(at)unsw(dot)edu(dot)au.
