[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069
[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069norrishuang wants to merge 13 commits intoapache:devfrom
Conversation
- New backend module: dolphinscheduler-task-emr-serverless - EmrServerlessTask: submit/track/cancel via AWS SDK v1 - Auth: reuse aws.emr.* config, fallback to DefaultCredentialsProvider - SPI registration via @autoservice - Frontend: EMR_SERVERLESS task type with form fields - applicationId, executionRoleArn, jobName, startJobRunRequestJson - i18n: en_US + zh_CN - BOM: add aws-java-sdk-emrserverless dependency
11 test cases covering: - Submit → track → success/failed/cancelled lifecycle - Full state transition (SUBMITTED→PENDING→SCHEDULED→RUNNING→SUCCESS) - Submit error handling (SDK exception) - GetJobRun returns null - Cancel application (with and without jobRunId) - Failover recovery via appIds - Parameter validation (checkParameters) - Invalid JSON handling
- Add maven-shade-plugin to emr-serverless pom.xml so shade jar is included in dist assembly - Add applicationId, executionRoleArn, startJobRunRequestJson fields to ITaskParams in types.ts to fix TypeScript build
The use-task.ts imports TASK_TYPES_MAP from store/project/task-type.ts (not constants/task-type.ts), so EMR_SERVERLESS must be defined there too. Missing entry caused 'Cannot read properties of undefined (reading taskExecuteType)' error when dragging the node onto canvas.
EMR Serverless has no local emulator, so the endpoint from aws.emr.* config (which often points to a local MinIO/S3 mock like localhost:9000) should not be used. Always use the standard AWS endpoint resolved by region. Also updated aws.yaml on deploy server to use InstanceProfileCredentialsProvider.
- Copy EMR icon for EMR_SERVERLESS task type (emr_serverless.png, emr_serverless_hover.png) - Add Chinese doc: docs/docs/zh/guide/task/emr-serverless.md - Add English doc: docs/docs/en/guide/task/emr-serverless.md - Register docs in sidebar config (docsdev.js) - Docs include: overview, task parameters, Spark/Hive examples, AWS auth config, job state transitions, and notices - Screenshot placeholders marked with TODO comments
|
Thanks for opening this pull request! Please check out our contributing guidelines. (https://github.com/apache/dolphinscheduler/blob/dev/docs/docs/en/contribute/join/pull-request.md) |
SbloodyS
left a comment
There was a problem hiding this comment.
Please add api-test or e2e for this. @norrishuang
Comprehensive unit tests have already been included for the EMR Serverless task plugin, covering job submission, state polling, success/failure/cancellation handling, failover recovery, parameter validation, and invalid input scenarios. Since this task plugin depends on AWS EMR Serverless, running api-test or e2e in the CI Docker environment would require AWS credentials and a running EMR Serverless application. I'm happy to add an api-test or e2e if there is a recommended approach for handling AWS authentication in CI. Could you share any guidance on this? |
| static final ObjectMapper objectMapper = new ObjectMapper() | ||
| .configure(FAIL_ON_UNKNOWN_PROPERTIES, false) | ||
| .configure(ACCEPT_EMPTY_ARRAY_AS_NULL_OBJECT, true) | ||
| .configure(READ_UNKNOWN_ENUM_VALUES_AS_NULL, true) | ||
| .configure(REQUIRE_SETTERS_FOR_GETTERS, true) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
|
…verage Add test cases covering: - Full job lifecycle (submit -> polling -> success/failure/cancelled) - Exception handling for submission and polling failures - Cancel application with empty jobRunId edge case - Failover recovery from appIds - Parameter validation and invalid JSON input - State-to-exit-code mapping - Application ID retrieval Tests use Mockito to mock EmrServerlessClient without requiring AWS credentials, following the same pattern as AliyunServerlessSparkTaskTest.
|
Thank you for the feedback @SbloodyS! I have enhanced the unit tests to provide comprehensive coverage of the EMR Serverless task plugin. The test suite now includes 15 test cases covering:
The tests use Mockito to mock |
|
Unit testing is not enough. You can refer to |
- Add EmrServerlessTaskAPITest to verify task submission and execution via DolphinScheduler REST API - Add docker-compose with WireMock to mock AWS EMR Serverless HTTP API (POST /applications/*/jobruns and GET /applications/*/jobruns/*) - Add WireMock stub mappings for StartJobRun and GetJobRun responses - Add workflow definition JSON for EMR Serverless success test case - Fix ObjectMapper deprecated configure() calls by switching to JsonMapper.builder() pattern (addresses SonarQube/CodeQL warning) - Support custom EMR_SERVERLESS_ENDPOINT env var in EmrServerlessTask to allow endpoint injection for testing with mock servers
|
Thank you for the guidance @SbloodyS! I have added an api-test for the EMR Serverless task plugin. Since this plugin depends on AWS EMR Serverless (a cloud service), running actual e2e tests in CI would require real AWS credentials and a running EMR Serverless application. To solve this, I used WireMock to mock the AWS EMR Serverless HTTP API — it's open-source and works entirely offline. What was added (commit: norrishuang/dolphinscheduler@b96944c):
Please let me know if any adjustments are needed. |
|
Yes. Using |
|
Hi @SbloodyS, I noticed the OWASP Dependency Check CI has been failing on the |
You can just ignore it for now. |



Was this PR generated or assisted by AI?
YES. The implementation was assisted by AI (Claude) for code generation, with human review, testing and verification on a real AWS EMR Serverless environment.
Purpose of the pull request
Add a new task plugin for Amazon EMR Serverless, enabling users to submit, monitor, and cancel Spark/Hive jobs on EMR Serverless applications directly from DolphinScheduler workflows.
Unlike the existing EMR on EC2 task plugin which manages EC2-based clusters, EMR Serverless is a serverless runtime that requires no cluster infrastructure management and automatically scales compute resources on demand.
Close 18070
Brief change log
Backend (new module:
dolphinscheduler-task-emr-serverless)EmrServerlessTask— extendsAbstractRemoteTask, implements submit/track/cancel lifecycle via AWS SDK v1 (StartJobRun,GetJobRun,CancelJobRun)EmrServerlessParameters— task parameter model (applicationId, executionRoleArn, jobName, startJobRunRequestJson)EmrServerlessTaskChannel/EmrServerlessTaskChannelFactory— SPI registration via@AutoService, registered asEMR_SERVERLESSEmrServerlessTaskException— dedicated exception classaws.emr.*config fromaws.yaml, falls back toDefaultAWSCredentialsProviderChainappIds(jobRunId)Frontend
use-emr-serverless.ts(fields) — form fields for Application Id, Execution Role Arn, Job Name, StartJobRunRequest JSON editoruse-emr-serverless.ts(tasks) — task model definitionDocumentation
docs/docs/zh/guide/task/emr-serverless.mddocs/docs/en/guide/task/emr-serverless.mdVerify this pull request
This change added tests and can be verified as follows:
EmrServerlessTaskTestwith 11 unit tests covering: success/failed/cancelled lifecycle, full state chain, submit error handling, null GetJobRun response, cancel with/without jobRunId, failover recovery, parameter validation, and invalid JSON handling.