You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 11, 2026. It is now read-only.
The generate_data function is currently integrated with various functionalities like taxonomy data ingestion, preprocessing and mixing, leading to maintenance and testing challenges. We propose refactoring this into a clean, dedicated Python API that handles only data generation. This separation will increase modularity and ease further development.
Objectives
Extract the generate logic from the existing implementation and encapsulate it within a new Python API.
Ensure this API is compatible with both standalone use and integration into the CLI.
Maintain the integrity of the existing codebase while simplifying the generation process.
Acceptance Criteria
Define the New API
Develop a Python API that focuses solely on the data generation process.
Include additional parameters such as dataset path, output save path, pipeline path.
Utilize the API within a CLI context to ensure seamless integration.
Independent SDG CLI
Use click for CLI development, providing options to configure the generation process directly from the command line.
Ensure that the current existing ilab CLI uses this new API effectively, passing all necessary parameters through command line options.
Testing and Debugging
Write comprehensive unit tests for the new API to ensure it works as expected under various configurations.
Documentation and Examples
Since the new SDG CLI will require you to pass your own dataset and pipeline, it is essential to update the project documentation to include detailed instructions on how to use the new API and CLI.
Provide example commands and configurations to help users get started with the new setup.