Skip to content

Comments

[SPARK-54276][BUILD] Bump Hadoop 3.4.3#54029

Closed
pan3793 wants to merge 3 commits intoapache:masterfrom
pan3793:SPARK-54276
Closed

[SPARK-54276][BUILD] Bump Hadoop 3.4.3#54029
pan3793 wants to merge 3 commits intoapache:masterfrom
pan3793:SPARK-54276

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jan 28, 2026

What changes were proposed in this pull request?

Upgrade Hadoop dependency to 3.4.3.

Why are the changes needed?

This release includes HADOOP-19212, which makes UGI work with Java 25.

https://hadoop.apache.org/release/3.4.3.html

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass CI. Also verified spark-sql can successfully bootstrap on JDK 25 now

$ java -version
openjdk version "25.0.1" 2025-10-21 LTS
OpenJDK Runtime Environment Temurin-25.0.1+8 (build 25.0.1+8-LTS)
OpenJDK 64-Bit Server VM Temurin-25.0.1+8 (build 25.0.1+8-LTS, mixed mode, sharing)

$ build/sbt -Phive,hive-thriftserver clean package

$ SPARK_PREPEND_CLASSES=true bin/spark-sql
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
WARNING: package sun.security.action not in java.base
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
26/01/28 17:23:22 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0)
26/01/28 17:23:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
26/01/28 17:23:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::arrayBaseOffset has been called by org.apache.spark.unsafe.Platform (file:/Users/chengpan/Projects/apache-spark/common/unsafe/target/scala-2.13/classes/)
WARNING: Please consider reporting this to the maintainers of class org.apache.spark.unsafe.Platform
WARNING: sun.misc.Unsafe::arrayBaseOffset will be removed in a future release
26/01/28 17:23:27 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
26/01/28 17:23:27 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore chengpan@127.0.0.1
26/01/28 17:23:27 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Spark Web UI available at http://10.242.159.140:4040
Spark master: local[*], Application Id: local-1769592205115
spark-sql (default)> select version();
4.2.0 14557582199659d838bbaa7d7b182e5d92c3b907
Time taken: 1.376 seconds, Fetched 1 row(s)
spark-sql (default)>

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link

github-actions bot commented Jan 28, 2026

JIRA Issue Information

=== Sub-task SPARK-54276 ===
Summary: Upgrade Hadoop to 3.4.3
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the CORE label Jan 28, 2026
@dongjoon-hyun
Copy link
Member

Nice! Thank you, @pan3793 .

@pan3793
Copy link
Member Author

pan3793 commented Jan 28, 2026

@steveloughran, seems not lucky, there are no classes in
- hadoop-client-api
- hadoop-client-runtime
- hadoop-client-minicluster

this is actually caused by my local maven repo dirty cache, sorry for making noise, the jars in the staging repo are good.

@steveloughran
Copy link
Contributor

@pan3793 sometimes it's good to rm -r all of ~/m2/repository/org/apache/hadoop (or any other project you actively work on). Saves disk space, even if your next few builds are slow.

@steveloughran
Copy link
Contributor

@pan3793 thanks for testing this.
@dongjoon-hyun anything you can do to help test would be good too -really hard a hard time getting bits of the rc out. FWIW the maven artifacts are being built on a raspberry pi as that worked more reliably network-wise than EC2 VMs within the cloudera vpn

@pan3793
Copy link
Member Author

pan3793 commented Jan 28, 2026

@steveloughran, thanks for tips, yes, I fixed it by rm -r ~/.m2/repository/org/apache/hadoop/**/3.4.3/.

For integration tests, I don't see any issue with default JDK 17, and I'm trying with JDK 25, so far, no issues are related to Hadoop.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 29, 2026 05:11
@pan3793
Copy link
Member Author

pan3793 commented Jan 29, 2026

Looks like all failed tests with Java 25 already have solutions or are easy to fix, except for datasketches-java 6.2.0 - it does not work with Java 25, upgrading involves API changes, which breaks the compile, opened apache/datasketches-memory#270, and hope that datasketches-memory 3.0.2 can have a new patch version to solve the Java 25 compatibility issues.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @steveloughran , and sorry for the late reply. I was traveling from South Korea to USA last weekend . I'm going to take a look at this PR.

I don't think there is an Hadoop issue here. It seems that @pan3793 just wanted to verify the result on Java 25.

The datasketches-java issue is a known issue of Apache Spark-side.

@pan3793
Copy link
Member Author

pan3793 commented Feb 2, 2026

@dongjoon-hyun, let me revert unrelated changes an keep this a simple Hadoop version upgrade, and I will open a new draft PR for Java 25 integration. BTW, I think I already have a solution for datasketches-java.

@dongjoon-hyun
Copy link
Member

Thank you always, @pan3793 .

@dongjoon-hyun
Copy link
Member

Although the failures seem flaky ones, could you re-run the failed test pipelines to make it sure, @pan3793 ?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 9, 2026

Hi, @pan3793 . I sent you an email (chengpan@apache.org).
Please check your email. Thank you always! 😄

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

thank you, @dongjoon-hyun, it's really a great news!

@dongjoon-hyun
Copy link
Member

Oh, my bad. I mistakenly send you an PMC template. It should be an Apache Spark Commiter invitation. Let me send out once more a correct one for the official committment. Very sorry, @pan3793 ~

@dongjoon-hyun
Copy link
Member

Definitely, I'll help you to the member of PMC later. But you know that it should start from the committer first.

@dongjoon-hyun
Copy link
Member

I sent a new one to chengpan@apache.org . Could you please accept once more in the correctly email, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

@dongjoon-hyun, I have replied to the email. Thank you again.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 9, 2026

Now, I added you to the committer list. Please check your Whimsy, @pan3793 . It's my pleasure to cowork with you in the community.

Screenshot 2026-02-09 at 3 51 53 PM

@dongjoon-hyun
Copy link
Member

It's announced too at dev@spark mailing list.

BTW, do you have an LinkedIn account, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Feb 10, 2026

@dongjoon-hyun, thanks! I'm not active on LinkedIn

@dongjoon-hyun
Copy link
Member

Got it. No problem~

@steveloughran
Copy link
Contributor

there's a new RC out now; maven staging repo is
https://repository.apache.org/content/repositories/orgapachehadoop-1465

@pan3793
Copy link
Member Author

pan3793 commented Feb 17, 2026

@steveloughran, thanks for the information, I found it and have updated here to use it a few days ago, so far, the test results look good. but I didn't find the vote mail in common-dev_at_hadoop, am I missed something?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 17, 2026

Thank you, @steveloughran . The following seems to be the new RC1 email, @pan3793 .

https://lists.apache.org/thread/pwntvvrxc6vb5sod74qmsjtb9wq0cn18

@pan3793 pan3793 changed the title [WIP][SPARK-54276][BUILD] Bump Hadoop 3.4.3 RC0 [WIP][SPARK-54276][BUILD] Bump Hadoop 3.4.3 RC1 Feb 18, 2026
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the official Apache Hadoop 3.4.3 since the vote succeeded, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Feb 24, 2026

@dongjoon-hyun, I see, but it seems the jars are not available on Maven Central yet, I'm waiting for that.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 24, 2026

Oh, ya. It's not synced yet. Thanks for checking.

BTW, for Java 25, we need Apache Hadoop 3.5.0 still for your HADOOP-19821, right?

@pan3793
Copy link
Member Author

pan3793 commented Feb 24, 2026

@dongjoon-hyun, I can't say full Java 25 support, but Spark is already able to bootstrap and pass GHA (there are some issues unrelated to Hadoop need to fix though) with Hadoop 3.4.3 with Java 25.

@dongjoon-hyun
Copy link
Member

Now, it's ready.

$ curl -I https://maven-central.storage-download.googleapis.com/maven2/org/apache/hadoop/hadoop-client-api/3.4.3/hadoop-client-api-3.4.3.pom
HTTP/2 200
...

@pan3793 pan3793 changed the title [WIP][SPARK-54276][BUILD] Bump Hadoop 3.4.3 RC1 [SPARK-54276][BUILD] Bump Hadoop 3.4.3 Feb 24, 2026
@pan3793 pan3793 marked this pull request as ready for review February 24, 2026 23:23
@pan3793
Copy link
Member Author

pan3793 commented Feb 24, 2026

@dongjoon-hyun, I contacted the ASF infra team, and it seems they fixed the Maven sync issue.

Removed the staging repo and rebased on the latest master. Now we just need to wait for CI pass (it should)

@pan3793
Copy link
Member Author

pan3793 commented Feb 25, 2026

CI is green now, it's ready to go.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for working on this and collaborating the Apache Hadoop community, @pan3793 .

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.2.0. I hope this unblocks the previous items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants