Skip to content

fix geospatial classification untrusted model loading rce#13961

Open
Sakura-501 wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Sakura-501:fix-geospatial-untrusted-model-loading-rce
Open

fix geospatial classification untrusted model loading rce#13961
Sakura-501 wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Sakura-501:fix-geospatial-untrusted-model-loading-rce

Conversation

@Sakura-501
Copy link
Copy Markdown

Summary

This PR fixes a remote code execution risk in people-and-planet-ai/geospatial-classification/serving_app caused by loading attacker-controlled TensorFlow/Keras models.

Vulnerability details

Before this change, /predict accepted a user-controlled bucket value, built gs://{bucket}/model_output, and passed that path directly into tf.keras.models.load_model().

That behavior is dangerous because TensorFlow/Keras model loading crosses a code trust boundary. If an attacker can point the sample at a model artifact they control, the service can deserialize and execute attacker-controlled logic during model loading or prediction. In practice this means a remote caller could coerce a deployed sample into loading a malicious model and trigger server-side code execution.

Root cause

  • The request body controlled the model source via bucket
  • The app concatenated that untrusted value into the model path
  • The resulting path was passed straight into tf.keras.models.load_model()
  • There was no trusted server-side allowlist or deployment-time binding for the model location

Fix

This PR removes request control over model selection and requires the serving app to load models only from a trusted deployment-time configuration:

  • add get_trusted_model_dir() in serving_app/main.py
  • require MODEL_DIR to be set by the deployer
  • stop reading the model location from the request body
  • document the secure configuration in people-and-planet-ai/geospatial-classification/README.md
  • add regression tests proving:
    • attacker-controlled bucket input no longer affects the model path
    • the app fails closed when MODEL_DIR is missing

Validation

I ran the sample-specific checks in a fresh virtual environment:

python -m black --check main.py test_main.py
python -m flake8 --show-source --builtin=gettext --max-complexity=20 --import-order-style=google --exclude='.nox,.cache,env,lib,generated_pb2,*_pb2.py,*_pb2_grpc.py' --ignore=ANN101,ANN102,E121,E123,E126,E203,E226,E24,E266,E501,E704,W503,W504,I202 --max-line-length=88 --application-import-names=main,predict,test_main .
pytest --noconftest -q test_main.py

All three commands completed successfully.

@Sakura-501 Sakura-501 requested review from a team as code owners March 30, 2026 09:34
@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: people-and-planet-ai labels Mar 30, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the security of the geospatial classification serving app by replacing user-provided model buckets with a mandatory MODEL_DIR environment variable, mitigating risks associated with loading untrusted artifacts. The changes include refactoring the /predict endpoint, updating documentation, and adding comprehensive tests. Feedback indicates that the error handling in the prediction endpoint should be improved to return appropriate HTTP status codes instead of a 200 OK when configuration errors occur.

bucket = args["bucket"]
model_dir = f"gs://{bucket}/model_output"
data = args["data"]
model_dir = get_trusted_model_dir()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The run_predict function currently returns a 200 OK status code even when an error occurs, such as when MODEL_DIR is unset and get_trusted_model_dir() raises a RuntimeError. For better API design and clearer communication to API consumers, it's recommended to return an appropriate HTTP status code (e.g., 400 Bad Request for client-side configuration errors or 500 Internal Server Error for server-side issues) when an exception is caught. This would provide a more standard and expected API response for error conditions.

The geospatial classification serving app previously built its
TensorFlow model path from the attacker-controlled `bucket`
request field and passed that path directly into
`tf.keras.models.load_model()`. Because untrusted Keras/TensorFlow
artifacts can execute attacker-controlled code during
deserialization and prediction, a remote caller could coerce the
service into loading a malicious model and trigger server-side code
execution.

This change fixes the vulnerability by requiring a trusted
`MODEL_DIR` deployment configuration for model loading, removing
request control over the model source, documenting the secure setup,
and adding regression tests that prove malicious request input no
longer changes the loaded model path and that the app fails closed
when the trusted model path is missing.
@Sakura-501 Sakura-501 force-pushed the fix-geospatial-untrusted-model-loading-rce branch from b72a4cf to 82d9089 Compare March 30, 2026 09:47
@Sakura-501
Copy link
Copy Markdown
Author

Updated the PR to address both issues:\n\n- added the missing Apache license header to to satisfy \n- changed the exception path to log and return a JSON error with HTTP 500, and updated the regression test accordingly\n\nValidation rerun locally after the change:\n\n

@Sakura-501
Copy link
Copy Markdown
Author

Updated the PR to address both issues:

  • added the missing Apache license header to serving_app/test_main.py to satisfy header-check
  • changed the /predict exception path to log and return a JSON error with HTTP 500, and updated the regression test accordingly

Validation rerun locally after the change:

python -m black --check main.py test_main.py
python -m flake8 --show-source --builtin=gettext --max-complexity=20 --import-order-style=google --exclude='.nox,.cache,env,lib,generated_pb2,_pb2.py,_pb2_grpc.py' --ignore=ANN101,ANN102,E121,E123,E126,E203,E226,E24,E266,E501,E704,W503,W504,I202 --max-line-length=88 --application-import-names=main,predict,test_main .
pytest --noconftest -q test_main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: people-and-planet-ai samples Issues that are directly related to samples.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant