🌍 Geospatial Exploration and Orchestration Studio

License
TerraStackAI
Built With
Deployment

🚀 Overview

The Geospatial Exploration and Orchestration Studio is an integrated platform for fine-tuning, inference, and orchestration of geospatial AI models. It combines a no-code UI, low-code SDK, and APIs to make working with geospatial data and AI accessible to everyone, from researchers to developers.

The platform supports on-prem or cloud deployment using Red Hat OpenShift or Kubernetes, enabling scalable pipelines for data preparation, model training, and inference.

By leveraging tools like TerraTorch, TerraKit, and Iterate, the Geospatial Studio accelerates insights from complex geospatial datasets for a diverse range of applications. 🌱

The studio is builds upon the broader ecosystem utilising TerraTorch for model fine-tuning and inference, and leveraging TerraKit for geospatial data search, query and processing.

🏗 Architecture

The Geospatial Studio is made up of a gateway API which provides access to all the backend services (fine-tuning, dataset onboarding/preparation, model management, inference pipelines). The code for the most of these core elements are found in the following repositories:

geospatial-studio (this repo)	https://github.com/terrastackai/geospatial-studio	Helm chart for core deployment Helm chart for pipelines deployment Deployment instructions and scripts
geospatial-studio-core	https://github.com/terrastackai/geospatial-studio-core	Studio Gateway API Tuning image build scripts Inference image build scripts Automated model deployment scripts
geospatial-studio-pipelines	https://github.com/terrastackai/geospatial-studio-pipelines	Inference pipeline components Pipeline orchestration wrapper Instructions and templates for creating new templates
geospatial-studio-ui	https://github.com/terrastackai/geospatial-studio-ui	Geospatial Studio web-based UI
geospatial-studio-toolkit	https://github.com/terrastackai/geospatial-studio-toolkit	Python SDK Jupyter notebook examples QGIS plugin

When deployed the studio will consist of the gateway api (which can trigger onboarding, fine-tuning and inference tasks), UI, deployed inference pipeline components, backend Postgresql database, MLflow and Geoserver. These are supported by an OAuth2 authenticator and S3-compatible object storage (both usually external). The architecture is shown in the diagram below.

💻🏢 Getting Started (cluster deployment)

If you want detailed description 📚 of the deployment process on an external cluster see here 📚.

The Geospatial Studio is primarily developed to be deployed on a Red Hat OpenShift or Kubernetes cluster, with access to NVIDIA GPU resources (for tuning and inference). This repository containers the Helm chart and scripts for full scale deployment.

To deploy in cluster:

Prerequisites:

Helm - v3.19 (currently incompatible with v4)
OpenShift CLI
Kubectl (bundled with above)
jq - json command-line processor
yq - yaml command-line processor
s3 storage class - e.g. ibm-object-s3fs or equivalent to install s3 storage in the cluster
s3 compatible storage - e.g. IBM Cloud COS to set up cloud object storage

Deployment steps

Install Python dependencies:

pip install -r requirements.txt

Set up the kubectl context or login to OpenShift: For OpenShift use the script below to login after supplying the token and server. These can be obtained from the OpenShift console.

oc login --token=<cluster-token> --server=<cluster-server>

Deploy the geospatial studio:

./deploy_studio_cluster.sh

Deployment is interactive and can take ~10 minutes (or longer) depending available download speed for container images.

You can follow the deployment from openshift console or k9s

After deployment the UI will pop up on the screen and you can jump to First steps.

💻⚙️ Getting Started (local deployment)

If you want detailed description 📚 of the local deployment process see here 📚.

Whilst not providing full performance and functionality, the studio can be deployed locally for testing and development purposes. The instructions below will deploy the main components of the Geospatial Studio in a Kubernetes cluster on the local machine (i.e. your laptop). This is provisioned through a Lima VM.

Data for the deployment will be persisted in a local folder ~/studio-data, you can change the location for this folder by editing the lima deployment configuration, deployment-scripts/lima/studio.yaml.

The automated shell script will deploy the local dependencies (Minio, Keycloak and Postgresql), before generating the deployment configuration for the studio and then deploying the main studio services + pipelines.

To deploy locally:

Prerequisites:

Lima VM - v1.2.1 (currently incompatible with v2)
Helm - v3.19 (currently incompatible with v4)
OpenShift CLI
Kubectl (bundled with above)
jq - json command-line processor
yq - yaml command-line processor

Deployment steps

Install Lima VM.
Install Python dependencies:

pip install -r requirements.txt

Start the Lima VM cluster:

For macOS >= 13.0, ARM use command below. For macOS >= 13.0, AMD consider using VZ without Rosetta, or use the QEMU as configured in deployment-scripts/lima/studio-linux.yaml

limactl start --name=studio deployment-scripts/lima/studio.yaml

For Linux use command below. It leverages QEMU and QEMU install will be required.

limactl start --name=studio deployment-scripts/lima/studio-linux.yaml

Set up the kubectl context:

export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"

Deploy the geospatial studio:

./deploy_studio_local.sh

Deployment can take ~10 minutes (or longer) depending available download speed for container images.

You can monitor the progress and debug using k9s or similar tools.

export KUBECONFIG="$HOME/.lima/studio/copied-from-guest/kubeconfig.yaml"
k9s

After deployment:
Access the Studio UI	https://localhost:4180
Access the Studio API	https://localhost:4181
Authenticate Studio	username: `testuser` password: `testpass123`
Access Geoserver	https://localhost:3000
Access Keycloak	https://localhost:8080
Access Minio	Console: https://localhost:9001 API: https://localhost:9000
Authenticate Minio	username: `minioadmin` password: `minioadmin`

If you need to restart any of the port-forwards you can use the following commands:

kubectl port-forward -n default svc/keycloak 8080:8080 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/postgresql 54320:5432 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/geofm-geoserver 3000:3000 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-ui 4180:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-gateway 4181:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n default deployment/geofm-mlflow 5000:5000 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/minio 9001:9001 >> studio-pf.log 2>&1 &
kubectl port-forward -n default svc/minio 9000:9000 >> studio-pf.log 2>&1 &

First steps

Now you have a clean deployment of the studio and it is time to start using it. The steps below will enable you to onboard some initial artefacts, before trying out the functionality.

Navigate to the UI front page and create an api key. Click on the Manage your API keys link. This should pop-up a window where you can generate, access and delete your api keys.

Copy your new api key to an env in your terminal:

export STUDIO_API_KEY="<your api key from the UI>"

Copy the UI url to an env in your terminal:

export UI_ROUTE_URL="https://localhost:4180"

Onboard the sandbox-models, these are placeholder models (pipelines) for onboarding existing inferences or testing tuned models.

./deployment-scripts/add-sandbox-models.sh

Onboard an existing inference output (useful for loading examples)

Onboard one of the inferences. This will start a pipeline to pull the data and set it up in the platform. You should now be able to browser to the inferences page in the UI and view the example/s you have added.

python populate-studio/populate-studio.py inferences
# select "AGB Data - Karen, Nairobi,kenya"

Onboard an existing tuned models and run inference

We will onboard a tuned model from a URL. This is initiated by an API call, which will trigger the onboarding process, starting download in the backend. Once the download is completed, it should appear with completed status in the UI models/tunes page. First we ensure we have a tuning task templates.

Onboard the tuning task templates. These are the outline configurations to make basic tuning tasks easier for users.

python populate-studio/populate-studio.py templates
# select  1. Segmentation - Generic template v1 and v2 models: Segmentation

python populate-studio/populate-studio.py tunes
# select "prithvi-eo-flood - prithvi-eo-flood"

Now we can trigger an inference run. This can be run through the UI or API (as here), where you tell which spatial and temporal domain over which to run inference. You need to get the tune_id for the onboarded tune (from the onboarding response or from the models/tunes page in the UI) and paste it into the command below.

tune_id="<paste tune_id here>"

payload='{
    "model_display_name": "geofm-sandbox-models",
    "location": "Dakhin Petbaha, Raha, Nagaon, Assam, India",
    "description": "Flood Assam local with sentinel aws",
    "spatial_domain": {
      "bbox": [
        [
          92.703396,26.247896,92.748087,26.267903
        ]
      ],
      "urls": [],
      "tiles": [],
      "polygons": []
    },
    "temporal_domain": [
      "2024-07-25_2024-07-28"
    ]
  }'

echo $payload | curl -X POST "${UI_ROUTE_URL}/studio-gateway/v2/tunes/${tune_id}/try-out" \
  --header 'Content-Type: application/json' \
  --header "X-API-Key: $STUDIO_API_KEY" \
  --insecure \
  --data @-

You can follow the progress of the inference run in the UI in the inference page. The files will be created in a new folder inside ~/studio-data/studio-inference-pvc/.

Tuning a model from a dataset

Note: Currently, for local deployments with access to non-NVIDIA GPUs (i.e. Mac), you will need to run the fine-tuning outside of the local cluster, and the resulting model can be onboarded back to the local cluster for inference. This will be addressed in future, and is not an issue for cluster deployments with accessible GPUs.

First onboard a tuning dataset. This can be done through the UI or the API, for now select and onboard a dataset using the below command. This will trigger a backend task to download, validate and sort the dataset ready for use. The dataset will appear in the UI datasets page, initally as pending, but will complete and change status after a few minutes.

python populate-studio/populate-studio.py datasets
# select "Wildfire burn scars"

Onboard the backbone model/s from which we will fine-tune.

python populate-studio/populate-studio.py backbones

Onboard the tuning task templates. These are the outline configurations to make basic tuning tasks easier for users.

python populate-studio/populate-studio.py templates

Now we can prepare the tuning task. In a cluster deployed studio instance a user will prepare and submit their tuning task in one step, however, for local deployments, due to GPU accessibility within VMs (especially on Mac), we will use the studio to create the tuning config file and then run it outside the studio with TerraTorch.

#Need to create a script to call the dry-run api, get the config to file and update paths.
payload='{
  "name": "burn-scars-demo",
  "description": "Segmentation",
  "dataset_id": "<dataset id here>",
  "base_model_id": "<backbone model id here>",
  "tune_template_id": "<tune template id here>",
  "model_parameters": {
    "runner": {
      "max_epochs": "10"
    },
    "optimizer": {
      "lr": 6e-05,
      "type": "AdamW"
    }
  }
}'

echo $payload | curl -X POST "${UI_ROUTE_URL}/studio-gateway/v2/submit-tune/dry-run" \
  --header 'Content-Type: application/json' \
  --header "X-API-Key: $STUDIO_API_KEY" \
  --insecure \
  --data @- >> config.yaml

./deployment-scripts/localize_config.sh config.yaml

Run the tuning task:

terratorch fit -c config.yaml

Upload the tune back to the studio. In this case we do it from the local config and checkpoint files. Once its complete, you should see the it in the UI under the tunes/models page.

Add api call to upload tune

Now we can use it for inference.

Add api call to run try out inference

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
deployment-scripts		deployment-scripts
docs		docs
geospatial-studio-pipelines		geospatial-studio-pipelines
geospatial-studio		geospatial-studio
populate-studio		populate-studio
tests/api-data		tests/api-data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.whitesource		.whitesource
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
deploy_studio_cluster.sh		deploy_studio_cluster.sh
deploy_studio_local.sh		deploy_studio_local.sh
deploy_studio_nvkind.sh		deploy_studio_nvkind.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌍 Geospatial Exploration and Orchestration Studio

🚀 Overview

🏗 Architecture

💻🏢 Getting Started (cluster deployment)

Prerequisites:

Deployment steps

💻⚙️ Getting Started (local deployment)

Prerequisites:

Deployment steps

First steps

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

terrastackai/geospatial-studio

Folders and files

Latest commit

History

Repository files navigation

🌍 Geospatial Exploration and Orchestration Studio

🚀 Overview

🏗 Architecture

💻🏢 Getting Started (cluster deployment)

Prerequisites:

Deployment steps

💻⚙️ Getting Started (local deployment)

Prerequisites:

Deployment steps

First steps

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages