| License |
|
| TerraStackAI |
|
| Built With |
|
| Deployment |
|
The Geospatial Exploration and Orchestration Studio is an integrated platform for fine-tuning, inference, and orchestration of geospatial AI models. It combines a no-code UI, low-code SDK, and APIs to make working with geospatial data and AI accessible to everyone, from researchers to developers.
The platform supports on-prem or cloud deployment using Red Hat OpenShift or Kubernetes, enabling scalable pipelines for data preparation, model training, and inference.
By leveraging tools like TerraTorch, TerraKit, and Iterate, the Geospatial Studio accelerates insights from complex geospatial datasets for a diverse range of applications. π±
The studio is builds upon the broader ecosystem utilising TerraTorch for model fine-tuning and inference, and leveraging TerraKit for geospatial data search, query and processing.
The Geospatial Studio is made up of a gateway API which provides access to all the backend services (fine-tuning, dataset onboarding/preparation, model management, inference pipelines). The code for the most of these core elements are found in the following repositories:
| geospatial-studio (this repo) | https://github.com/terrastackai/geospatial-studio |
|
| geospatial-studio-core | https://github.com/terrastackai/geospatial-studio-core |
|
| geospatial-studio-pipelines | https://github.com/terrastackai/geospatial-studio-pipelines |
|
| geospatial-studio-ui | https://github.com/terrastackai/geospatial-studio-ui |
|
| geospatial-studio-toolkit | https://github.com/terrastackai/geospatial-studio-toolkit |
|
When deployed the studio will consist of the gateway api (which can trigger onboarding, fine-tuning and inference tasks), UI, deployed inference pipeline components, backend Postgresql database, MLflow and Geoserver. These are supported by an OAuth2 authenticator and S3-compatible object storage (both usually external). The architecture is shown in the diagram below.
If you want detailed description π of the deployment process on an external cluster see here π.
The Geospatial Studio is primarily developed to be deployed on a Red Hat OpenShift or Kubernetes cluster, with access to NVIDIA GPU resources (for tuning and inference). This repository containers the Helm chart and scripts for full scale deployment.
To deploy in cluster:
- Helm - v3.19 (currently incompatible with v4)
- OpenShift CLI
- Kubectl (bundled with above)
- jq - json command-line processor
- yq - yaml command-line processor
- s3 storage class - e.g. ibm-object-s3fs or equivalent to install s3 storage in the cluster
- s3 compatible storage - e.g. IBM Cloud COS to set up cloud object storage
- Install Python dependencies:
pip install -r requirements.txt- Set up the kubectl context or login to OpenShift: For OpenShift use the script below to login after supplying the token and server. These can be obtained from the OpenShift console.
oc login --token=<cluster-token> --server=<cluster-server>- Deploy the geospatial studio:
./deploy_studio_cluster.shDeployment is interactive and can take ~10 minutes (or longer) depending available download speed for container images.
You can follow the deployment from openshift console or k9s
After deployment the UI will pop up on the screen and you can jump to First steps.
If you want detailed description π of the local deployment process see here π.
Whilst not providing full performance and functionality, the studio can be deployed locally for testing and development purposes. The instructions below will deploy the main components of the Geospatial Studio in a Kubernetes cluster on the local machine (i.e. your laptop). This is provisioned through a Lima VM.
Data for the deployment will be persisted in a local folder ~/studio-data, you can change the location for this folder by editing the lima deployment configuration, deployment-scripts/lima/studio.yaml.
The automated shell script will deploy the local dependencies (Minio, Keycloak and Postgresql), before generating the deployment configuration for the studio and then deploying the main studio services + pipelines.
To deploy locally:
- Lima VM - v1.2.1 (currently incompatible with v2)
- Helm - v3.19 (currently incompatible with v4)
- OpenShift CLI
- Kubectl (bundled with above)
- jq - json command-line processor
- yq - yaml command-line processor
- Install Lima VM.
- Install Python dependencies:
pip install -r requirements.txt- Start the Lima VM cluster:
For macOS >= 13.0, ARM use command below. For macOS >= 13.0, AMD consider using VZ without Rosetta, or use the QEMU as configured in deployment-scripts/lima/studio-linux.yaml
limactl start --name=studio deployment-scripts/lima/studio.yamlFor Linux use command below. It leverages QEMU and QEMU install will be required.
limactl start --name=studio deployment-scripts/lima/studio-linux.yaml- Set up the kubectl context:
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"- Deploy the geospatial studio:
./deploy_studio_local.shDeployment can take ~10 minutes (or longer) depending available download speed for container images.
You can monitor the progress and debug using k9s or similar tools.
export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
k9s| After deployment: | |
|---|---|
| Access the Studio UI | https://localhost:4180 |
| Access the Studio API | https://localhost:4181 |
| Authenticate Studio | username: testuser password: testpass123 |
| Access Geoserver | https://localhost:3000 |
| Access Keycloak | https://localhost:8080 |
| Access Minio | Console: https://localhost:9001 API: https://localhost:9000 |
| Authenticate Minio | username: minioadmin password: minioadmin |
If you need to restart any of the port-forwards you can use the following commands:
kubectl port-forward -n default svc/keycloak 8080:8080 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/postgresql 54320:5432 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/geofm-geoserver 3000:3000 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-ui 4180:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-gateway 4181:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-mlflow 5000:5000 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/minio 9001:9001 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/minio 9000:9000 >> studio-pf.log 2>&1 &Now you have a clean deployment of the studio and it is time to start using it. The steps below will enable you to onboard some initial artefacts, before trying out the functionality.
- Navigate to the UI front page and create an api key. Click on the
Manage your API keyslink. This should pop-up a window where you can generate, access and delete your api keys.
- Copy your new api key to an env in your terminal:
export STUDIO_API_KEY="<your api key from the UI>"- Copy the UI url to an env in your terminal:
export UI_ROUTE_URL="https://localhost:4180"- Onboard the
sandbox-models, these are placeholder models (pipelines) for onboarding existing inferences or testing tuned models.
./deployment-scripts/add-sandbox-models.shOnboard an existing inference output (useful for loading examples)
- Onboard one of the
inferences. This will start a pipeline to pull the data and set it up in the platform. You should now be able to browser to the inferences page in the UI and view the example/s you have added.
python populate-studio/populate-studio.py inferences
# select "AGB Data - Karen, Nairobi,kenya"Onboard an existing tuned models and run inference
- We will onboard a tuned model from a URL. This is initiated by an API call, which will trigger the onboarding process, starting download in the backend. Once the download is completed, it should appear with completed status in the UI models/tunes page.
First we ensure we have a tuning task
templates.
Onboard the tuning task templates. These are the outline configurations to make basic tuning tasks easier for users.
python populate-studio/populate-studio.py templates
# select 1. Segmentation - Generic template v1 and v2 models: Segmentationpython populate-studio/populate-studio.py tunes
# select "prithvi-eo-flood - prithvi-eo-flood"- Now we can trigger an inference run. This can be run through the UI or API (as here), where you tell which spatial and temporal domain over which to run inference. You need to get the
tune_idfor the onboarded tune (from the onboarding response or from the models/tunes page in the UI) and paste it into the command below.
tune_id="<paste tune_id here>"
payload='{
"model_display_name": "geofm-sandbox-models",
"location": "Dakhin Petbaha, Raha, Nagaon, Assam, India",
"description": "Flood Assam local with sentinel aws",
"spatial_domain": {
"bbox": [
[
92.703396,26.247896,92.748087,26.267903
]
],
"urls": [],
"tiles": [],
"polygons": []
},
"temporal_domain": [
"2024-07-25_2024-07-28"
]
}'
echo $payload | curl -X POST "${UI_ROUTE_URL}/studio-gateway/v2/tunes/${tune_id}/try-out" \
--header 'Content-Type: application/json' \
--header "X-API-Key: $STUDIO_API_KEY" \
--insecure \
--data @-- You can follow the progress of the inference run in the UI in the inference page. The files will be created in a new folder inside
~/studio-data/studio-inference-pvc/.
Tuning a model from a dataset
Note: Currently, for local deployments with access to non-NVIDIA GPUs (i.e. Mac), you will need to run the fine-tuning outside of the local cluster, and the resulting model can be onboarded back to the local cluster for inference. This will be addressed in future, and is not an issue for cluster deployments with accessible GPUs.
- First onboard a tuning dataset. This can be done through the UI or the API, for now select and onboard a dataset using the below command. This will trigger a backend task to download, validate and sort the dataset ready for use. The dataset will appear in the UI datasets page, initally as pending, but will complete and change status after a few minutes.
python populate-studio/populate-studio.py datasets
# select "Wildfire burn scars"- Onboard the backbone model/s from which we will fine-tune.
python populate-studio/populate-studio.py backbones- Onboard the tuning task
templates. These are the outline configurations to make basic tuning tasks easier for users.
python populate-studio/populate-studio.py templates- Now we can prepare the tuning task. In a cluster deployed studio instance a user will prepare and submit their tuning task in one step, however, for local deployments, due to GPU accessibility within VMs (especially on Mac), we will use the studio to create the tuning config file and then run it outside the studio with TerraTorch.
#Need to create a script to call the dry-run api, get the config to file and update paths.
payload='{
"name": "burn-scars-demo",
"description": "Segmentation",
"dataset_id": "<dataset id here>",
"base_model_id": "<backbone model id here>",
"tune_template_id": "<tune template id here>",
"model_parameters": {
"runner": {
"max_epochs": "10"
},
"optimizer": {
"lr": 6e-05,
"type": "AdamW"
}
}
}'
echo $payload | curl -X POST "${UI_ROUTE_URL}/studio-gateway/v2/submit-tune/dry-run" \
--header 'Content-Type: application/json' \
--header "X-API-Key: $STUDIO_API_KEY" \
--insecure \
--data @- >> config.yaml
./deployment-scripts/localize_config.sh config.yaml- Run the tuning task:
terratorch fit -c config.yaml- Upload the tune back to the studio. In this case we do it from the local config and checkpoint files. Once its complete, you should see the it in the UI under the tunes/models page.
Add api call to upload tune- Now we can use it for inference.
Add api call to run try out inference


