fix geospatial classification untrusted model loading rce#13961
fix geospatial classification untrusted model loading rce#13961Sakura-501 wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the security of the geospatial classification serving app by replacing user-provided model buckets with a mandatory MODEL_DIR environment variable, mitigating risks associated with loading untrusted artifacts. The changes include refactoring the /predict endpoint, updating documentation, and adding comprehensive tests. Feedback indicates that the error handling in the prediction endpoint should be improved to return appropriate HTTP status codes instead of a 200 OK when configuration errors occur.
| bucket = args["bucket"] | ||
| model_dir = f"gs://{bucket}/model_output" | ||
| data = args["data"] | ||
| model_dir = get_trusted_model_dir() |
There was a problem hiding this comment.
The run_predict function currently returns a 200 OK status code even when an error occurs, such as when MODEL_DIR is unset and get_trusted_model_dir() raises a RuntimeError. For better API design and clearer communication to API consumers, it's recommended to return an appropriate HTTP status code (e.g., 400 Bad Request for client-side configuration errors or 500 Internal Server Error for server-side issues) when an exception is caught. This would provide a more standard and expected API response for error conditions.
The geospatial classification serving app previously built its TensorFlow model path from the attacker-controlled `bucket` request field and passed that path directly into `tf.keras.models.load_model()`. Because untrusted Keras/TensorFlow artifacts can execute attacker-controlled code during deserialization and prediction, a remote caller could coerce the service into loading a malicious model and trigger server-side code execution. This change fixes the vulnerability by requiring a trusted `MODEL_DIR` deployment configuration for model loading, removing request control over the model source, documenting the secure setup, and adding regression tests that prove malicious request input no longer changes the loaded model path and that the app fails closed when the trusted model path is missing.
b72a4cf to
82d9089
Compare
|
Updated the PR to address both issues:\n\n- added the missing Apache license header to to satisfy \n- changed the exception path to log and return a JSON error with HTTP 500, and updated the regression test accordingly\n\nValidation rerun locally after the change:\n\n |
|
Updated the PR to address both issues:
Validation rerun locally after the change: python -m black --check main.py test_main.py |
Summary
This PR fixes a remote code execution risk in
people-and-planet-ai/geospatial-classification/serving_appcaused by loading attacker-controlled TensorFlow/Keras models.Vulnerability details
Before this change,
/predictaccepted a user-controlledbucketvalue, builtgs://{bucket}/model_output, and passed that path directly intotf.keras.models.load_model().That behavior is dangerous because TensorFlow/Keras model loading crosses a code trust boundary. If an attacker can point the sample at a model artifact they control, the service can deserialize and execute attacker-controlled logic during model loading or prediction. In practice this means a remote caller could coerce a deployed sample into loading a malicious model and trigger server-side code execution.
Root cause
buckettf.keras.models.load_model()Fix
This PR removes request control over model selection and requires the serving app to load models only from a trusted deployment-time configuration:
get_trusted_model_dir()inserving_app/main.pyMODEL_DIRto be set by the deployerpeople-and-planet-ai/geospatial-classification/README.mdbucketinput no longer affects the model pathMODEL_DIRis missingValidation
I ran the sample-specific checks in a fresh virtual environment:
All three commands completed successfully.