Skip to content

[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069

Open
norrishuang wants to merge 13 commits intoapache:devfrom
norrishuang:dev
Open

[Feature-18070][Task] Add Amazon EMR Serverless task plugin#18069
norrishuang wants to merge 13 commits intoapache:devfrom
norrishuang:dev

Conversation

@norrishuang
Copy link
Copy Markdown

@norrishuang norrishuang commented Mar 14, 2026

Was this PR generated or assisted by AI?

YES. The implementation was assisted by AI (Claude) for code generation, with human review, testing and verification on a real AWS EMR Serverless environment.

Purpose of the pull request

Add a new task plugin for Amazon EMR Serverless, enabling users to submit, monitor, and cancel Spark/Hive jobs on EMR Serverless applications directly from DolphinScheduler workflows.
Unlike the existing EMR on EC2 task plugin which manages EC2-based clusters, EMR Serverless is a serverless runtime that requires no cluster infrastructure management and automatically scales compute resources on demand.
Close 18070

Brief change log

Backend (new module: dolphinscheduler-task-emr-serverless)

  • EmrServerlessTask — extends AbstractRemoteTask, implements submit/track/cancel lifecycle via AWS SDK v1 (StartJobRun, GetJobRun, CancelJobRun)
  • EmrServerlessParameters — task parameter model (applicationId, executionRoleArn, jobName, startJobRunRequestJson)
  • EmrServerlessTaskChannel / EmrServerlessTaskChannelFactory — SPI registration via @AutoService, registered as EMR_SERVERLESS
  • EmrServerlessTaskException — dedicated exception class
  • Authentication: reuses aws.emr.* config from aws.yaml, falls back to DefaultAWSCredentialsProviderChain
  • Supports failover recovery via appIds (jobRunId)
    Frontend
  • use-emr-serverless.ts (fields) — form fields for Application Id, Execution Role Arn, Job Name, StartJobRunRequest JSON editor
  • use-emr-serverless.ts (tasks) — task model definition
  • Registered in task type constants, store, format-data, i18n (en_US/zh_CN)
  • Task icon (reuses EMR icon)
    Documentation
  • Chinese doc: docs/docs/zh/guide/task/emr-serverless.md
  • English doc: docs/docs/en/guide/task/emr-serverless.md
  • Includes: overview, task parameters, Spark/Hive JSON examples, AWS auth config, job state transitions, screenshots

Verify this pull request

This change added tests and can be verified as follows:

  • Added EmrServerlessTaskTest with 11 unit tests covering: success/failed/cancelled lifecycle, full state chain, submit error handling, null GetJobRun response, cancel with/without jobRunId, failover recovery, parameter validation, and invalid JSON handling.
  • Manually verified by deploying to an EC2 instance in Standalone mode and successfully submitting a Spark job to a real AWS EMR Serverless application.

- New backend module: dolphinscheduler-task-emr-serverless
  - EmrServerlessTask: submit/track/cancel via AWS SDK v1
  - Auth: reuse aws.emr.* config, fallback to DefaultCredentialsProvider
  - SPI registration via @autoservice
- Frontend: EMR_SERVERLESS task type with form fields
  - applicationId, executionRoleArn, jobName, startJobRunRequestJson
  - i18n: en_US + zh_CN
- BOM: add aws-java-sdk-emrserverless dependency
11 test cases covering:
- Submit → track → success/failed/cancelled lifecycle
- Full state transition (SUBMITTED→PENDING→SCHEDULED→RUNNING→SUCCESS)
- Submit error handling (SDK exception)
- GetJobRun returns null
- Cancel application (with and without jobRunId)
- Failover recovery via appIds
- Parameter validation (checkParameters)
- Invalid JSON handling
- Add maven-shade-plugin to emr-serverless pom.xml so shade jar is
  included in dist assembly
- Add applicationId, executionRoleArn, startJobRunRequestJson fields
  to ITaskParams in types.ts to fix TypeScript build
The use-task.ts imports TASK_TYPES_MAP from store/project/task-type.ts
(not constants/task-type.ts), so EMR_SERVERLESS must be defined there
too. Missing entry caused 'Cannot read properties of undefined
(reading taskExecuteType)' error when dragging the node onto canvas.
EMR Serverless has no local emulator, so the endpoint from aws.emr.*
config (which often points to a local MinIO/S3 mock like localhost:9000)
should not be used. Always use the standard AWS endpoint resolved by
region. Also updated aws.yaml on deploy server to use
InstanceProfileCredentialsProvider.
- Copy EMR icon for EMR_SERVERLESS task type (emr_serverless.png, emr_serverless_hover.png)
- Add Chinese doc: docs/docs/zh/guide/task/emr-serverless.md
- Add English doc: docs/docs/en/guide/task/emr-serverless.md
- Register docs in sidebar config (docsdev.js)
- Docs include: overview, task parameters, Spark/Hive examples,
  AWS auth config, job state transitions, and notices
- Screenshot placeholders marked with TODO comments
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg bot commented Mar 14, 2026

Thanks for opening this pull request! Please check out our contributing guidelines. (https://github.com/apache/dolphinscheduler/blob/dev/docs/docs/en/contribute/join/pull-request.md)

@github-actions github-actions bot added UI ui and front end related backend test document labels Mar 14, 2026
@norrishuang norrishuang changed the title [Feature][Task] Add Amazon EMR Serverless task plugin [Feature-18070][Task] Add Amazon EMR Serverless task plugin Mar 14, 2026
Copy link
Copy Markdown
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add api-test or e2e for this. @norrishuang

@norrishuang
Copy link
Copy Markdown
Author

Please add api-test or e2e for this. @norrishuang

Comprehensive unit tests have already been included for the EMR Serverless task plugin, covering job submission, state polling, success/failure/cancellation handling, failover recovery, parameter validation, and invalid input scenarios. Since this task plugin depends on AWS EMR Serverless, running api-test or e2e in the CI Docker environment would require AWS credentials and a running EMR Serverless application. I'm happy to add an api-test or e2e if there is a recommended approach for handling AWS authentication in CI. Could you share any guidance on this?

Comment on lines +85 to +89
static final ObjectMapper objectMapper = new ObjectMapper()
.configure(FAIL_ON_UNKNOWN_PROPERTIES, false)
.configure(ACCEPT_EMPTY_ARRAY_AS_NULL_OBJECT, true)
.configure(READ_UNKNOWN_ENUM_VALUES_AS_NULL, true)
.configure(REQUIRE_SETTERS_FOR_GETTERS, true)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ObjectMapper.configure
should be avoided because it has been deprecated.
@sonarqubecloud
Copy link
Copy Markdown

…verage

Add test cases covering:
- Full job lifecycle (submit -> polling -> success/failure/cancelled)
- Exception handling for submission and polling failures
- Cancel application with empty jobRunId edge case
- Failover recovery from appIds
- Parameter validation and invalid JSON input
- State-to-exit-code mapping
- Application ID retrieval

Tests use Mockito to mock EmrServerlessClient without requiring
AWS credentials, following the same pattern as AliyunServerlessSparkTaskTest.
@norrishuang
Copy link
Copy Markdown
Author

Thank you for the feedback @SbloodyS!

I have enhanced the unit tests to provide comprehensive coverage of the EMR Serverless task plugin. The test suite now includes 15 test cases covering:

  • Full job lifecycle: job submission → state polling → success/failure/cancelled
  • Exception handling: submission failures, polling failures, null job run responses
  • Cancel operation: cancel running job, cancel with empty jobRunId edge case
  • Failover recovery: restore job run ID from appIds after worker restart
  • Parameter validation: missing required fields, invalid JSON input
  • State mapping: all final states → exit code mapping
  • Application ID retrieval: getApplicationIds()

The tests use Mockito to mock EmrServerlessClient, following the same pattern as AliyunServerlessSparkTaskTest in the codebase. Since this plugin depends on AWS EMR Serverless, running actual e2e tests in the CI Docker environment would require AWS credentials and a running EMR Serverless application, which is not feasible in the standard CI setup.

Commit: norrishuang/dolphinscheduler@44f43eb

@SbloodyS
Copy link
Copy Markdown
Member

Unit testing is not enough. You can refer to dolphinscheduler-api-test and dolphinscheduler-e2e modules. @norrishuang

- Add EmrServerlessTaskAPITest to verify task submission and execution
  via DolphinScheduler REST API
- Add docker-compose with WireMock to mock AWS EMR Serverless HTTP API
  (POST /applications/*/jobruns and GET /applications/*/jobruns/*)
- Add WireMock stub mappings for StartJobRun and GetJobRun responses
- Add workflow definition JSON for EMR Serverless success test case
- Fix ObjectMapper deprecated configure() calls by switching to
  JsonMapper.builder() pattern (addresses SonarQube/CodeQL warning)
- Support custom EMR_SERVERLESS_ENDPOINT env var in EmrServerlessTask
  to allow endpoint injection for testing with mock servers
@norrishuang
Copy link
Copy Markdown
Author

Thank you for the guidance @SbloodyS!

I have added an api-test for the EMR Serverless task plugin. Since this plugin depends on AWS EMR Serverless (a cloud service), running actual e2e tests in CI would require real AWS credentials and a running EMR Serverless application. To solve this, I used WireMock to mock the AWS EMR Serverless HTTP API — it's open-source and works entirely offline.

What was added (commit: norrishuang/dolphinscheduler@b96944c):

  1. EmrServerlessTaskAPITest — api-test that exercises the full task execution flow via DolphinScheduler REST API:

    • Login → create project → import workflow definition → online workflow → trigger execution → assert success
  2. docker-compose.yaml — spins up DolphinScheduler standalone + WireMock:

    • WireMock mocks POST /applications/*/jobruns (StartJobRun) and GET /applications/*/jobruns/* (GetJobRun → SUCCESS)
    • DS connects to WireMock via EMR_SERVERLESS_ENDPOINT=http://wiremock:8080
  3. Fixed deprecated ObjectMapper.configure() calls by switching to JsonMapper.builder() pattern (addresses the CodeQL warning)

Please let me know if any adjustments are needed.

@github-actions github-actions bot added the e2e e2e test label Mar 23, 2026
@SbloodyS
Copy link
Copy Markdown
Member

SbloodyS commented Mar 23, 2026

Yes. Using WireMock is good for now. You can continue coding. @norrishuang

@norrishuang
Copy link
Copy Markdown
Author

Hi @SbloodyS, I noticed the OWASP Dependency Check CI has been failing on the dev branch consistently (not just on this PR). Is this a known issue? Do I need to take any action on my side to get this PR reviewed?

@SbloodyS
Copy link
Copy Markdown
Member

Hi @SbloodyS, I noticed the OWASP Dependency Check CI has been failing on the dev branch consistently (not just on this PR). Is this a known issue? Do I need to take any action on my side to get this PR reviewed?

You can just ignore it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend document e2e e2e test test UI ui and front end related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Task] Support Amazon EMR Serverless task plugin

2 participants