8000 [DISCUSS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. by slfan1989 · Pull Request #7337 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[DISCUSS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. #7337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

slfan1989
Copy link
Contributor
@slfan1989 slfan1989 commented Jan 29, 2025

Description of PR

Upgrading JUnit from 4 to 5 is a key task in the process of upgrading our Hadoop project to JDK 17. This task is somewhat complex due to the large number of unit tests and the high coupling between modules. The good news is that our team members have already submitted several upgrade-related PRs, which will significantly ease the workload for t 10000 he subsequent upgrades. However, the previous upgrade efforts lacked a unified summary and tracking, making it difficult to monitor progress and manage the modules that still need to be upgraded. Therefore, the purpose of this PR is to centralize the tracking of the entire process of upgrading JUnit from 4 to 5, summarize the progress, address the issues encountered during the upgrade, and discuss key challenges.

I have thoroughly reviewed all of our modules and summarized the current upgrade progress.

  • The following modules do not require an upgrade:
hadoop-assemblies
hadoop-build-tools
hadoop-client
hadoop-client-api
hadoop-client-check-invariants
hadoop-client-check-test-invariants
hadoop-client-minicluster
hadoop-client-runtime
hadoop-cloud-storage
hadoop-annotations
hadoop-auth-example
hadoop-dist
hadoop-hdfs-native-client
hadoop-hdfs-nfs
hadoop-maven-plugins
hadoop-minicluster
hadoop-project
hadoop-project-dist
hadoop-benchmark
hadoop-dynamometer-dist
hadoop-openstack
hadoop-pipes
hadoop-tools-dist
hadoop-yarn-applications-catalog-docker
hadoop-yarn-registry
hadoop-yarn-site
hadoop-yarn-ui

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 21s #7337 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7337/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@jojochuang
Copy link
Contributor

Wow! Appreciate such a huge work!
You'd probably find it easier to divide it into subtasks, component by component. This is impossible to review too.

@slfan1989 slfan1989 changed the title [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common. [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5. Jan 30, 2025
@slfan1989
Copy link
Contributor Author

Wow! Appreciate such a huge work!
You'd probably find it easier to divide it into subtasks, component by component. This is impossible to review too.

Thank you for your message! This PR is currently an experimental version (it cannot be merged directly). I have attempted to upgrade some modules related to hadoop-common from JUnit 4 to JUnit 5. During the upgrade process, there are some issues that need to be discussed with everyone to establish the basic standards for this module upgrade.

@slfan1989
Copy link
Contributor Author
slfan1989 commented Jan 30, 2025

@steveloughran @ayushtkn @Hexiaoqiao @jojochuang @aajisaka @cnauroth @szetszwo

Upgrading JUnit from 4 to 5 is not only a technical upgrade opportunity but also a good chance for us to improve the test-related code and fix potential issues. I’d like to discuss whether the improvement strategies I’ve been considering are reasonable.

  1. Import Optimization

There are currently two ways to reference JUnit methods in the project:

  • The first way is by importing static methods, allowing us to directly use assertTrue in unit tests.
  • The second way is by importing org.junit.Assert, which requires us to use the fully qualified name (e.g., Assert.assertTrue) in unit tests.

I hope to unify all references to the first method (static imports) in this update, but I'm unsure if this is feasible.

  1. Import * Optimization

I’ve noticed that many unit tests use the import * approach. Do we need to expand these imports in the unit tests?

  1. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos

There are several Checkstyle issues in the code, such as incomplete JavaDocs and typos. Should we also address these issues in this update?

Some logs use + for string concatenation instead of placeholders. Some methods could also be optimized using lambda expressions. Should we address these issues in this update as well?

  1. Improvements for Large Modules

For example, the hadoop-common module, I’ve noticed, has many downstream dependencies, affecting several other modules, as shown below:

hadoop-azure
hadoop-aws
hadoop-distcp
hadoop-azure-datalake
hadoop-hdfs
hadoop-hdfs-rbf
hadoop-cos
hadoop-aliyun
hadoop-mapreduce-client-core

We can’t make all the changes at once because there are too many involved modules, and some test classes have dependencies on each other.

I’ve found that JUnit 4 and JUnit 5 can coexist, meaning that a module can have both JUnit 4 and JUnit 5 test styles, and the tests will still run successfully. For very large modules, we can modify them incrementally, based on the package structure. For example, for hadoop-common, we can submit multiple PRs to complete the module’s upgrade step by step. Make sure that each PR submitted doesn’t include changes to more than 50 classes, as this will make the review process easier.

HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common Part1.

cli、conf、constants、crypto、fs

HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common Part2.

ha、http

If we encounter unit tests that depend on other modules, we can leave them unchanged for now. Once the non-dependent parts of the other modules are upgraded, we can submit a single PR to complete the upgrade of the dependencies.

Let’s illustrate with an example:

The FCStatisticsBaseTest class in hadoop-common has dependencies on the following three modules:
hadoop-aws, hadoop-common, and hadoop-aliyun.

image

Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I’d also like to sincerely thank @zhtttylz for providing the unit test upgrade tool. With this tool, I was able to complete the PR, which involves multiple modules, in just two days, significantly improving efficiency. If we can confirm the coding guidelines, @zhtttylz and I will submit the relevant PRs together to complete the JUnit version upgrade.

I’m also willing to volunteer to drive and follow up on the upgrade work to ensure it’s completed in a shorter time frame. I estimate the upgrade should take about 1-3 months.

@cnauroth
Copy link
Contributor

Wow, this is awesome @slfan1989 and @zhtttylz ! Here are my opinions on how to proceed.

  1. Import Optimization
    ...
  2. Import * Optimization
    ...
  3. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos
    ...

I'd like to suggest that in this phase, we stick to a minimal effort of making a straight translation to JUnit 5 in the interest of completing a JDK 17 release as soon as possible. These are going to be large pull requests to review, even after we break them into sub-tasks. It will take longer to code review if we start including other unrelated improvements.

All of these proposals are good code quality improvements though. We can pick them up as future work after completing the JUnit 5 migration and after the JDK 17 release.

I've also heard there's a desire to go toward AssertJ. If so, then that might change the plan on points 1 and 2.

  1. Improvements for Large Modules
    ...
    Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I agree with your strategy here. It sounds like this is the best way to manage pull request size and code review process.

@slfan1989
Copy link
Contributor Author

Wow, this is awesome @slfan1989 and @zhtttylz ! Here are my opinions on how to proceed.

  1. Import Optimization
    ...
  2. Import * Optimization
    ...
  3. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos
    ...

I'd like to suggest that in this phase, we stick to a minimal effort of making a straight translation to JUnit 5 in the interest of completing a JDK 17 release as soon as possible. These are going to be large pull requests to review, even after we break them into sub-tasks. It will take longer to code review if we start including other unrelated improvements.

All of these proposals are good code quality improvements though. We can pick them up as future work after completing the JUnit 5 migration and after the JDK 17 release.

I've also heard there's a desire to go toward AssertJ. If so, then that might change the plan on points 1 and 2.

  1. Improvements for Large Modules
    ...
    Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I agree with your strategy here. It sounds like this is the best way to manage pull request size and code review process.

@cnauroth Thank you for your message! I agree with your point of view. We will only make the necessary changes and will not adjust other improvements for now. Once the JUnit 5 upgrade is complete, we will proceed with further optimizations.

Here are the improvement principles I've summarized:

  1. Focus on making the necessary changes to ensure the upgrade is completed smoothly.
  2. If the Yetus report highlights Checkstyle issues or other problems, we will address them to achieve a successful compile result.
  3. Break down large modules into smaller PRs to make the code review process more efficient and user-friendly.

cc: @steveloughran @ayushtkn @Hexiaoqiao

@cnauroth
Copy link
Contributor

@slfan1989 , great summary. Thank you for these patches!

@slfan1989 slfan1989 changed the title [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5. [DISCUESS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. Feb 1, 2025
@slfan1989
Copy link
Contributor Author

After the merge of HADOOP-15984, we noticed that some unit tests stopped running. To resolve this, we found a solution that allows the unit tests to run again. We can refer to HADOOP-19405. hadoop-aws and hadoop-azure tests have stopped running.(#7335) for more details.

However, during the process of completing this PR, a new issue was exposed — we encountered a large number of Javadoc errors. During the JDK 11 upgrade, we addressed issues in some modules, but many modules still have Javadoc problems. We plan to address these issues during the JDK 17 upgrade process.

I do not intend to create a separate JIRA and PR for each module to fix the Javadoc errors, as this would generate significant noise. Improving Javadoc does not pose any execution risks since the PRs won’t modify the code itself.

For larger modules, such as hadoop-common, hdfs, hdfs-rbf, resourcemanager, and nodemanager, we will create individual JIRAs. For other modules, we will submit the changes under larger module scopes, such as hadoop-tools, yarn, mapreduce, and hdfs.

@slfan1989
Copy link
Contributor Author

To sync the recent progress, I have completed the upgrade of all MapReduce module tests from JUnit4 to JUnit5 locally (there may still be some issues, but we'll wait for Yetus feedback). I plan to start submitting the PRs tomorrow.

@steveloughran steveloughran changed the title [DISCUESS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. [DISCUSS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. Feb 5, 2025
@slfan1989
Copy link
Contributor Author

@cnauroth @steveloughran

The following PR is ready, please help to review it

MapReduce:
MAPREDUCE-7420. [JDK17] Upgrade Junit 4 to 5 in hadoop-mapreduce-client-core Part1. #7363
MAPREDUCE-7421. [JDK17] Upgrade Junit 4 to 5 in hadoop-mapreduce-client-jobclient Part1. #7358

HADOOP:
HADOOP-19431. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-distcp. #7368
HADOOP-19425. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-azure Part1. #7369

HDFS:
HDFS-17719. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-hdfs-httpfs Part1. #7371

@slfan1989
Copy link
Contributor Author

I’d like to provide an update on the progress of the JUnit 5 upgrade and will share periodic updates and specific details:

The MapReduce module upgrade has been completed.

For the YARN module upgrade, the remaining modules include:

  • hadoop-yarn-server-resourcemanager
  • hadoop-yarn-server-router
  • hadoop-yarn-server-timelineservice-hbase-tests

The related improvement PRs have been submitted (compilation is mostly complete), but there are still some issues (e.g., checkstyle problems and failing unit tests), which are being actively addressed.

The hadoop-common module upgrade has progressed to Part 6. After Part 6, there will be several more PRs, most of which involve cross-module changes requiring collaboration from different module members for code reviews. I plan to continue pushing these PRs forward next week.

My expectation is to complete the entire JUnit 5 upgrade by the end of March to mid-April. Since time is tight, I’ve invited @zhtttylz to assist with the upgrade. He will be responsible for helping upgrade the HDFS-related unit tests, and this work will begin next week.

cc: @ayushtkn @cnauroth @steveloughran @Hexiaoqiao

@h-vetinari
Copy link

Is there a reason w 9E88 hy all these tasks (resp. this PR) aren't linked to https://issues.apache.org/jira/browse/HADOOP-14693? It makes it a bit hard to follow the current status/progress of the conversion.

@slfan1989
Copy link
Contributor Author

We have made some progress in upgrading from Junit4 to Junit5. I will summarize the current progress and share the relevant updates by the end of this week.

@slfan1989
Copy link
Contributor Author

s there a reason why all these tasks (resp. this PR) aren't linked to https://issues.apache.org/jira/browse/HADOOP-14693? It makes it a bit hard to follow the current status/progress of the conversion.

@h-vetinari Thank you for your question! I will provide a complete update on the progress of the Junit4 to Junit5 upgrade this week and link the relevant JIRA tasks.

@slfan1989
Copy link
Contributor Author

I’d like to provide an update on the progress of upgrading Hadoop from JUnit 4 to JUnit 5. I’ve outlined all the modules that need to be upgraded, totaling 51 modules. Currently, we have completed the upgrade of 42 modules, with 9 modules still in progress, bringing the overall progress to over 80%.

Here’s the detailed progress:

HADOOP-19414: Upgrade JUnit from 4 to 5 in hadoop-auth (#7638)
The PR has been created, and local unit tests have passed. We are now waiting for feedback from Yetus, and some checkstyle issues may still need to be addressed.

HADOOP-19415: Upgrade JUnit from 4 to 5 in hadoop-common
This module still requires 3-5 PRs to fix unit tests with inheritance relationships. #7419 is still being followed up, and once ready, I’ll invite community members for review.

HDFS-12431: Upgrade JUnit from 4 to 5 in hadoop-hdfs
This module may require at least 10 more PRs, and @zhtttylz is currently following up on it.

HADOOP-19421: Upgrade JUnit from 4 to 5 in hadoop-aliyun (#7634)
The PR is ready and waiting for review. We need a team member familiar with this module to help with the review.

HADOOP-19433: Upgrade JUnit from 4 to 5 in hadoop-extras (#7586)
The PR is ready and waiting for review.

hadoop-aws, hadoop-azure, hadoop-azure-datalake
These modules are progressing in parallel, with no PRs submitted yet. The upgrade is expected to start next week.

@slfan1989
Copy link
Contributor Author

I am continuing to work on the JUnit5 upgrade for the Hadoop-common module. The remaining tasks are more challenging, involving cross-module unit test upgrades. The good news is that I have mostly completed the task of splitting the remaining unit tests, specifically for PART6 to PART12, and have finished one round of dependency upgrades. I DC38 estimate that it will take about another week to complete the development work for PART6 to PART12.

@slfan1989
Copy link
Contributor Author

@cnauroth @steveloughran @Hexiaoqiao @ayushtkn @szetszwo @anujmodi2021 @jojochuang @mukund-thakur

The Hadoop Common module has been split into Part6 to Part12 and is now ready for review. I greatly need your help to continue advancing the Junit5 upgrade work.

Part6 PR (#7419) primarily modifies the unit tests of AbstractFSContractTestBase and its subclasses, involving the hadoop-common, hadoop-hdfs, hadoop-mapreduce, hadoop-aws, hadoop-azure, and hadoop-azure-datalake modules. This part cannot be split, and I kindly ask for your review.

Part7 PR (#7664) primarily modifies certain classes related to dependencies in the hadoop-common and hadoop-hdfs modules.

Part8 PR (#7667) primarily modifies the unit tests of SymlinkBaseTest, FCStatisticsBaseTest, FileContextCreateMkdirBaseTest, and their subclasses, involving the hadoop-common and hadoop-hdfs modules.

Part9 PR (#7668) primarily modifies the unit tests of HadoopTestBase and its subclasses, involving the hadoop-common, hadoop-hdfs, hadoop-mapreduce, hadoop-aws, hadoop-azure, and hadoop-azure-datalake modules.

Part10 PR (#7670) primarily modifies the unit tests of AbstractAuditingTest, AbstractHadoopTestBase, and their subclasses, involving the hadoop-common, hadoop-hdfs, hadoop-mapreduce, hadoop-aws, hadoop-azure, and hadoop-azure-datalake modules.

Part11 PR (#7671) primarily modifies the IO-related unit tests in the hadoop-common module.

Part12 PR (#7672) primarily optimizes the IO-related unit tests in the hadoop-common module.

I hope everyone can provide valuable feedback to help us smoothly move forward with the next steps. Thank you!

@slfan1989
Copy link
Contributor Author

Things have been a bit busy over the past two weeks, so the progress on JUnit 5 was delayed. I’ll be picking it up again this week and will continue to push it forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0