feat: Support Actor schema storages with Alias mechanism#797
feat: Support Actor schema storages with Alias mechanism#797
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #797 +/- ##
==========================================
+ Coverage 85.83% 86.12% +0.28%
==========================================
Files 46 46
Lines 2761 2768 +7
==========================================
+ Hits 2370 2384 +14
+ Misses 391 384 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d9c7349 to
3b36459
Compare
TODO - how should it behave locally?
There was a problem hiding this comment.
Pull request overview
Adds support for Actor “schema storages” / multiple pre-created storages by loading storage IDs into Configuration and pre-registering alias→ID mappings so open_* (alias=...) resolves to those platform-provided storages when running on Apify.
Changes:
- Introduces
Configuration.actor_storages(parsed from an env-provided JSON object) via newActorStoragesmodel. - Extends
AliasResolverwithregister_aliases()and calls it duringActorinitialization on the Apify platform to seed alias mappings. - Adds unit/integration/E2E tests validating configuration parsing and alias resolution behavior.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/apify/_configuration.py |
Adds ActorStorages + env/validator wiring for Configuration.actor_storages. |
src/apify/storage_clients/_apify/_alias_resolving.py |
Adds AliasResolver.register_aliases() to bulk write alias mappings into default KVS + in-memory cache. |
src/apify/_actor.py |
Calls register_aliases() on Actor startup when running on-platform. |
tests/unit/actor/test_configuration.py |
Unit test for parsing storages JSON env var into actor_storages. |
tests/integration/test_storages.py |
Integration test asserting alias mapping is preserved/extended in default KVS. |
tests/e2e/test_schema_storages/test_schema_storages.py |
E2E test ensuring schema-defined storages are usable via alias at runtime. |
tests/e2e/test_schema_storages/actor_source/main.py |
Actor code used by the E2E test to open a dataset by alias and validate ID. |
tests/e2e/test_schema_storages/actor_source/actor.json |
Actor schema defining storages for the E2E scenario. |
tests/e2e/test_schema_storages/__init__.py |
Marks the new E2E package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| client = await cls._get_default_kvs_client(configuration=configuration) | ||
| existing_mapping = ((await client.get_record(cls._ALIAS_MAPPING_KEY)) or {'value': {}}).get('value', {}) | ||
|
|
||
| # Update the existing mapping with the configuration mapping. | ||
| existing_mapping.update(configuration_mapping) | ||
| # Store the updated mapping back in the KVS and in memory. | ||
| await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping) |
There was a problem hiding this comment.
existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {}) assumes the record always has a value key containing a dict. However, this module already documents/handles get_record sometimes returning the mapping dict directly (without value). In that case, this code will treat the mapping as missing and overwrite it with only configuration_mapping. Also, if value is present but not a dict (e.g. None), existing_mapping.update(...) will raise. Please mirror the normalization logic used in _get_alias_map/store_mapping: normalize record into a dict[str, str] whether it comes wrapped in {key,value} or as a raw mapping, otherwise default to {}.
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
We can probably make the storage mechanism a little bit more complicated, but avoid some API calls on actor init, which is probably worth it. |
valekjo
left a comment
There was a problem hiding this comment.
Looks good, but I'm not strong in python :)
One question
68d67b9 to
02f9244
Compare
02f9244 to
e2df853
Compare
| BeforeValidator(lambda data: json.loads(data) if isinstance(data, str) else data or None), | ||
| ] = None | ||
|
|
||
| actor_storages: Annotated[ |
There was a problem hiding this comment.
I'm not sure about the naming here - we usually strip the actor_ prefix. But I do agree that a plain storages would look odd. So let's keep it this way, I guess.
| datasets = {'default': 'default_dataset_id', 'custom': 'custom_Dataset_id'} | ||
| request_queues = {'default': 'default_request_queue_id', 'custom': 'custom_RequestQueue_id'} | ||
| key_value_stores = {'default': 'default_key_value_store_id', 'custom': 'custom_KeyValueStore_id'} |
There was a problem hiding this comment.
I assume the mixing of camel and snake cases is not intended
| datasets = {'default': 'default_dataset_id', 'custom': 'custom_Dataset_id'} | |
| request_queues = {'default': 'default_request_queue_id', 'custom': 'custom_RequestQueue_id'} | |
| key_value_stores = {'default': 'default_key_value_store_id', 'custom': 'custom_KeyValueStore_id'} | |
| datasets = {'default': 'default_dataset_id', 'custom': 'custom_dataset_id'} | |
| request_queues = {'default': 'default_request_queue_id', 'custom': 'custom_request_queue_id'} | |
| key_value_stores = {'default': 'default_key_value_store_id', 'custom': 'custom_key_value_store_id'} |
There was a problem hiding this comment.
That value can be any string, so I just used one that made it more convenient to write the test.
Look at the assert loop and the fstring:
f'custom_{storage_type}_id'
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Description
Configurationto includeactor_storagesthat is loaded fromactor_storages_jsonenv variable.AliasResolverto be able to resolve alias mapping fromConfiguration.Issues
Testing
Checklist