HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. #2732

ayushtkn · 2021-03-02T07:30:57Z

https://issues.apache.org/jira/browse/HADOOP-17531

jojochuang · 2021-03-02T07:37:22Z

hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm

@@ -362,6 +362,7 @@ Command Line Options
 | `-copybuffersize <copybuffersize>` | Size of the copy buffer to use. By default, `<copybuffersize>` is set to 8192B | |
 | `-xtrack <path>` | Save information about missing source files to the specified path. | This option is only valid with `-update` option. This is an experimental property and it cannot be used with `-atomic` option. |
 | `-direct` | Write directly to destination paths | Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store |
+| `-useIterator` | Uses single threaded listStatusIterator to build listing | Useful for saving memory at the client side. |


Does it implicitly void -numListstatusThreads? sounds like a bad new for running distcp on cloud storage where latency is big.

Thanx @jojochuang for having a look.
Yes, It indeed isn't meant for object stores, I am trying a multi threaded approach for object stores too as part of HADOOP-17558, that won't be too much memory efficient, but still find a balance between speed and memory. I have a WIP patch for that as well, will share that on the jira

This is basically for HDFS or FS where listing is not slow, but there are memory constraints, my scenario is basically for DR, where it is in general HDFS->HDFS or HDFS->S3

I was think we shouuld update the doc to mention it will disable -numListstatusThreads. But if we can merge that WIP patch soon then it's fine.

I have updated the document, Let me know if something more can be improved.

jojochuang · 2021-03-08T11:59:35Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+  }
+
+  @SuppressWarnings("checkstyle:parameternumber")
+  private void prepareListing(Path path, SequenceFile.Writer fileListWriter,


Sorry for coming back late.
Can we refactor this method a bit to use fewer parameters?

maybe we should refactor traverseDirectory() into a class since we pass over the parameters here and there.

Done, Refactored TraverseDirectory to a class

jojochuang · 2021-03-08T12:03:02Z

hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm

@@ -362,6 +362,7 @@ Command Line Options
 | `-copybuffersize <copybuffersize>` | Size of the copy buffer to use. By default, `<copybuffersize>` is set to 8192B | |
 | `-xtrack <path>` | Save information about missing source files to the specified path. | This option is only valid with `-update` option. This is an experimental property and it cannot be used with `-atomic` option. |
 | `-direct` | Write directly to destination paths | Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store |
+| `-useIterator` | Uses single threaded listStatusIterator to build listing | Useful for saving memory at the client side. |


I was think we shouuld update the doc to mention it will disable -numListstatusThreads. But if we can merge that WIP patch soon then it's fine.

ayushtkn · 2021-03-09T04:13:07Z

Thanx @jojochuang for the review, I have addressed the review comments, Please have a look. :-)

steveloughran

added suggestions about

logging duration of list
logging IOSTats of iterator
moving the test to the AbstractContractDistCpTest and so tested by the object stores (note: That test isn't executed by HDFS)

steveloughran · 2021-03-09T18:38:08Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java

+          + " target location, avoiding temporary file rename.")),
+
+  USE_ITERATOR(DistCpConstants.CONF_LABEL_USE_ITERATOR,
+      new Option("useIterator", false,


could we have some non-mixed-case arg; I always get confused here?

steveloughran · 2021-03-09T18:39:08Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

@@ -18,6 +18,7 @@

 package org.apache.hadoop.tools;

+import org.apache.hadoop.fs.RemoteIterator;


needs to go into the "real hadoop imports" block; your IDE is getting confused. Putting it in the right place makes backporting waay easier

steveloughran · 2021-03-09T18:39:57Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+
+    public void traverseDirectoryMultiThreaded() throws IOException {
+      assert numListstatusThreads > 0;
+      if (LOG.isDebugEnabled()) {


you can go to slf4j logging here; this is all commons-logging era (distcp lagged)

steveloughran · 2021-03-09T18:41:02Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+            if (workResult.getSuccess()) {
+              LinkedList<CopyListingFileStatus> childCopyListingStatus =
+                  DistCpUtils.toCopyListingFileStatus(sourceFS, child,
+                      preserveAcls && child.isDirectory(),


child.isDirectory() is called enough it could go into a variable

steveloughran · 2021-03-09T18:41:28Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+          LOG.error("Could not get item from childQueue. Retrying...");
+        }
+      }
+      workers.shutdown();


should this be in a finally clause?

steveloughran · 2021-03-09T18:47:43Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+          prepareListing(child.getPath());
+        }
+      }
+    }


Can you add an IOStatisticsLogging.logIOStatisticsAtDebug(LOG, listStatus) call here. That way at debug level you get a log from s3a, soon abfs of what IO took place for the list, performance etc. Really interesting.

Yeps, that is something cool, I extracted a part of it to hadoop-common, let me know if you have objections doing that, well I wanted to move whole of it to common, just left it because of the class CallableSupplier, I thought moving this might cause some incompatibility problems, as this was being used in the prod code as well.

happy for you to take what's merged up. That CallableSupplier can be moved if you need to

hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpWithRawXAttrs.java

steveloughran · 2021-03-09T18:49:58Z

hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpWithRawXAttrs.java

+                  fs.mkdirs(source);

            
+    // Create 10 dirs inside.
+    for (int i = 0; i < 10; i++) {
+      fs.mkdirs(new Path("/src/sub" + i));


skip this and just delegate to the children; saves 10 calls

steveloughran · 2021-03-09T18:52:44Z

hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpWithRawXAttrs.java

+        for (int k = 0; k < 10; k++) {
+          Path parentPath = new Path("/src/sub" + i + "/subsub" + j);
+          Path filePath = new Path(parentPath, "file" + k);
+          DFSTestUtil.createFile(fs, filePath, 1024L, (short) 3, 1024L);


actually, you can go straight to the createfile, without doing any mkdirs. Still going to take 10^3 calls on an object store though. If you do move something of this size there then

the create files should be done in an executor pool (see ITestPartialRenamesDeletes.createDirsAndFiles())

parameters should be something configurable, just a subclass getWidth() would be enough

steveloughran · 2021-03-09T19:27:46Z

hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpWithRawXAttrs.java

+
+    // Check that all 1000 files got copied.
+    RemoteIterator<LocatedFileStatus> destFileItr = fs.listFiles(dest, true);
+    int numFiles = 0;


Assertions.assertThat(RemoteIterators.toList(fs.listFiles(dest, true))) .describedAs("files").hasSize(...)

that way: if the size isn't met, the error includes the list of all files which were found.

ayushtkn · 2021-03-10T10:51:55Z

Thanx @steveloughran for the review. I have addressed the review comments. Please have a look

steveloughran

LGTM, some minor comments. Have you had a chance to test the s3a distcp client through this yet?

steveloughran · 2021-03-10T18:08:35Z

hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java

 import org.apache.hadoop.fs.FileUtil;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListenableFuture;


recent trunk changes #2522 will have broken this; just use direct references to BlockingThreadPoolExecutorService

steveloughran · 2021-03-10T18:08:55Z

hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java

@@ -64,12 +71,22 @@

 import org.apache.hadoop.thirdparty.com.google.common.base.Joiner;
 import org.apache.hadoop.thirdparty.com.google.common.collect.Sets;
+import org.slf4j.LoggerFactory;


probably needs to go somewhere else in the imports

steveloughran · 2021-03-10T18:10:29Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hdfs.protocol.SnapshotDiffReport;
 import org.apache.hadoop.io.SequenceFile;
 import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.io.Text;
+import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;


now, these imports we are trying to leave up where they were. Because when cherrypicking we're trying to stay on the older versions. it's a PITA, I know

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

steveloughran · 2021-03-10T18:14:52Z

hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java

+          prepareListing(child.getPath());
+        }
+      }
+    }


happy for you to take what's merged up. That CallableSupplier can be moved if you need to

...hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java

steveloughran · 2021-03-10T18:18:00Z

@mukund-thakur

ayushtkn · 2021-03-11T11:07:57Z

Thanx @steveloughran for the review, I have addressed the comments. Please give a check.
Regarding S3A I couldn't check this there, not very experienced in that domain as well :-(, And folks since start of this were like this is gonna be drastic for object stores, so I marked myself a new task to coin something different for data migration from Cloud to on prem,

hadoop-yetus · 2021-03-14T08:16:28Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 34s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 7 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 22s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 17s		trunk passed
+1 💚	compile	20m 41s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	17m 58s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 48s		trunk passed
+1 💚	mvnsite	3m 12s		trunk passed
+1 💚	javadoc	2m 23s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	3m 3s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 37s		trunk passed
+1 💚	shadedclient	14m 32s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 25s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 52s		the patch passed
+1 💚	compile	19m 57s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	19m 57s		the patch passed
+1 💚	compile	17m 56s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	17m 56s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 42s		root: The patch generated 0 new + 113 unchanged - 5 fixed = 113 total (was 118)
+1 💚	mvnsite	3m 10s		the patch passed
+1 💚	javadoc	2m 20s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 36s		hadoop-common in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚	javadoc	0m 39s		hadoop-distcp in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚	javadoc	0m 44s		hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 85 unchanged - 3 fixed = 85 total (was 88)
+1 💚	spotbugs	5m 7s		the patch passed
+1 💚	shadedclient	14m 45s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 21s		hadoop-common in the patch passed.
-1 ❌	unit	18m 34s	/patch-unit-hadoop-tools_hadoop-distcp.txt	hadoop-distcp in the patch passed.
+1 💚	unit	2m 18s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 0s		The patch does not generate ASF License warnings.
		220m 33s

Reason	Tests
Failed junit tests	hadoop.tools.TestDistCpSystem

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux ce79958dd934 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 9d9a4f7d4cabccb0075026ef175af4e82a1ef682
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/testReport/
Max. process+thread count	3152 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-03-14T12:21:52Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 35s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 8 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 13s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 13s		trunk passed
+1 💚	compile	20m 39s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	17m 50s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 45s		trunk passed
+1 💚	mvnsite	3m 14s		trunk passed
+1 💚	javadoc	2m 22s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	3m 3s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 36s		trunk passed
+1 💚	shadedclient	14m 12s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 52s		the patch passed
+1 💚	compile	19m 58s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	19m 58s		root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚	compile	18m 1s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	18m 1s		root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 41s		root: The patch generated 0 new + 124 unchanged - 5 fixed = 124 total (was 129)
+1 💚	mvnsite	3m 8s		the patch passed
+1 💚	javadoc	2m 20s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	1m 41s		hadoop-common in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚	javadoc	0m 38s		hadoop-distcp in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚	javadoc	0m 45s		hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 85 unchanged - 3 fixed = 85 total (was 88)
+1 💚	spotbugs	5m 11s		the patch passed
+1 💚	shadedclient	17m 49s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	18m 29s	/patch-unit-hadoop-common-project_hadoop-common.txt	hadoop-common in the patch passed.
+1 💚	unit	28m 32s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 38s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 1s		The patch does not generate ASF License warnings.
		234m 30s

Reason	Tests
Failed junit tests	hadoop.metrics2.source.TestJvmMetrics

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux e6eaa913a411 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / d0b8e147d80ab79e999e52813297e43d10b8b1b5
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/testReport/
Max. process+thread count	1391 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

ayushtkn · 2021-03-14T12:29:06Z

Have removed the timeout from TestDistCpSystem it failed with a timeout error in my latest build, where I just removed some unused import from the aws package and with the checkstyle it passed in 15 secs only which was half the timeout, the test doesn't goes through my part of code, and I am pretty sure DurationInfo can't be that costly. In fact the test timeout out before reaching the distCp stuff, Not sure, what happened.

Created a test PR without any changes, Still TestDistCpSystem failed

#2773 (comment)

The recent test failure isn't related.
Please help review!!!

steveloughran

yes, last failure is an OOM, so unrelated

steveloughran · 2021-03-15T15:56:53Z

hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java

@@ -53,8 +53,10 @@
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FileUtil;
 import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListenableFuture;
+import org.apache.hadoop.thirdparty.com.google.common.base.Charsets;


these need to go into the "other imports" block. Yes, it's a PITA, but its goal is to manage backporting. And guess what I have to do a lot of?

steveloughran · 2021-03-15T16:03:00Z

codewise, all looks good and tests seem spurious. I think what I'd like to do now is checkout and tests the s3a and abfs distp suites on my system

steveloughran · 2021-03-15T16:23:46Z

S3A Test failure

[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[ERROR] Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 519.266 s <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[ERROR] testDistCpWithIterator(org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp)  Time elapsed: 376.629 s  <<< ERROR!
java.lang.IllegalArgumentException: Wrong FS file://null//Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-aws/target/test-dir/ITestS3AContractDistCp/testDistCpWithIterator/local/dest -expected s3a://stevel-london
	at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1153)
	at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:665)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1115)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1141)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4396)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

steveloughran · 2021-03-15T16:44:31Z

And abfs

cureDistCp
[ERROR] testDistCpWithIterator(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp)  Time elapsed: 504.938 s  <<< ERROR!
java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, GET, https://stevelukwest.dfs.core.windows.net/stevel-testing?upn=false&resource=filesystem&maxResults=5000&directory=Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-azure/target/test-dir/ITestAbfsFileSystemContractSecureDistCp/testDistCpWithIterator/local/dest&timeout=90&recursive=false, PathNotFound, "The specified path does not exist. RequestId:45fc14b7-f01f-002c-2db9-19d834000000 Time:2021-03-15T16:34:00.9374305Z"
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1177)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:407)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1971)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2013)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2179)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2178)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2161)
	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2287)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2284)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: Operation failed: "The specified path does not exist.", 404, GET, https://stevelukwest.dfs.core.windows.net/stevel-testing?upn=false&resource=filesystem&maxResults=5000&directory=Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-azure/target/test-dir/ITestAbfsFileSystemContractSecureDistCp/testDistCpWithIterator/local/dest&timeout=90&recursive=false, PathNotFound, "The specified path does not exist. RequestId:45fc14b7-f01f-002c-2db9-19d834000000 Time:2021-03-15T16:34:00.9374305Z"
	at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:199)
	at org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:229)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:907)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:877)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:859)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:404)
	... 24 more

[INFO]
[INFO] Results:

So -1 on the tests there. And ideally, it's time to see if you can sort out some AWS and/or azure credentials. Even testing against a minio docker image would be a good initial starting point

ayushtkn · 2021-03-15T17:21:45Z

Thanx @steveloughran for trying that out. I will figure out a way to run that UT.
I found a doc:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html#Supporting_FileSystems_with_login_and_authentication_parameters

I will sort out the cred stuff and try following this doc, let me know if this isn't the best or correct doc to follow.

I added a HDFS contract test as well and that fetched me a same exception as S3A:


java.lang.IllegalArgumentException: Wrong FS: file:/Users/ayushsaxena/code/hadoop-code/osCode/hadoop/hadoop-tools/hadoop-distcp/target/test-dir/TestHDFSContractDistCp/testDistCpWithIterator/local/dest, expected: hdfs://localhost:58099

	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:806)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:257)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1272)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1259)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1204)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1200)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1218)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2161)
	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2287)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2284)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

I fixed it, So I suppose S3A should work, But the ABFS stuff I need to check, I feel it doesn't have a filesystem check (checkPath), but I will figure out. Thanx!!!

steveloughran · 2021-03-15T17:27:39Z

Hey, I've realised that my auditing PR #2675 is going to clash on CallableSupplier changes as its taking auditing spans in, activating them before an after.

Can you restore the S3A callable stuff, alongside the copy you've made in fs.impl? That will stop the PRs conflicting. Thanks

ayushtkn · 2021-03-15T18:01:10Z

Can you restore the S3A callable stuff, alongside the copy you've made in fs.impl? That will stop the PRs conflicting.

Do you mean to say revert all the aws changes? and move these classes back to the aws module? To a state before this commit:
1d27dfc

That shouldn't be a problem, just to know will it be OK if I keep a parent class in the Hadoop-Common and child class named CallableSupplier in aws module can that also work for you?

steveloughran · 2021-03-15T19:16:01Z

Problem is this: https://github.com/steveloughran/hadoop/blob/s3/HADOOP-17511-auditing/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CallableSupplier.java

I'm wrapping each of the ops in an enter/exit of auditing, the changes are pretty traumatic. Unless I can rework how the invocation happens, we'll need to keep them separate.

Here's what I'd like to propose

your patches copies, rather than moves callable supplier. Maybe give it a new name to distinguish it from the one in s3
the old one stays in aws s3, same name etc.

Your patch can go in to trunk and I can co-exist my dev with it. If I can see a way to move I'll adopt, but it will allow us to diverge, with the hadoop common CallableSupplier more broadly used

steveloughran · 2021-03-15T19:18:55Z

I added a HDFS contract test as well and that fetched me a same exception as S3A:

The only reason we don't have one of those already is that it slowed down HDFS test runs and all the other distcp tests used mini HDFS clusters. But we clearly need it to regression test things

Maybe we should add it, but give it a different name from Test*, (and comment in the parent class) so that its only run when explicitly asked for. Some of the S3 tests are like that. Test cases you have to run by hand or from the IDE -but which ma 10000 ven skips

hadoop-yetus · 2021-03-16T01:37:50Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 19s		Maven dependency ordering for branch
+1 💚	mvninstall	21m 25s		trunk passed
+1 💚	compile	23m 5s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	19m 22s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 57s		trunk passed
+1 💚	mvnsite	3m 4s		trunk passed
+1 💚	javadoc	2m 11s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	2m 50s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 34s		trunk passed
+1 💚	shadedclient	15m 23s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 58s		the patch passed
+1 💚	compile	21m 22s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	21m 22s		root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚	compile	18m 48s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	18m 48s		root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	4m 16s		root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚	mvnsite	3m 14s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
+1 💚	javadoc	2m 20s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	2m 59s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	5m 7s		the patch passed
+1 💚	shadedclient	14m 50s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 14s		hadoop-common in the patch passed.
+1 💚	unit	19m 4s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 14s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 0s		The patch does not generate ASF License warnings.
		229m 29s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 2425df3a5e0e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / e97d6afc36cead0f34fb957d630b5141334e7b68
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/testReport/
Max. process+thread count	3152 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

ayushtkn · 2021-03-16T06:30:14Z

Thanx @steveloughran for the review.
Sorted the S3 stuff, The test passes:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 679.069 s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11:27 min
[INFO] Finished at: 2021-03-16T11:49:50+05:30
[INFO] ------------------------------------------------------------------------
ayushsaxena@ayushsaxena-MBP16 hadoop-aws %

Region: ap-south-1

The newly added HDFS contract:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.tools.contract.OptionalTestHDFSContractDistCp
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 257.242 s - in org.apache.hadoop.tools.contract.OptionalTestHDFSContractDistCp
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:26 min
[INFO] Finished at: 2021-03-16T11:57:21+05:30
[INFO] ------------------------------------------------------------------------

ayushtkn · 2021-03-17T19:21:25Z

@steveloughran any further comments?

hadoop-yetus · 2021-03-20T14:57:54Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 33s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 31s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 21s		trunk passed
+1 💚	compile	20m 50s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	18m 4s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 46s		trunk passed
+1 💚	mvnsite	3m 12s		trunk passed
-1 ❌	javadoc	1m 1s	/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt	hadoop-common in trunk failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚	javadoc	3m 1s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 33s		trunk passed
+1 💚	shadedclient	14m 38s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 25s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 50s		the patch passed
+1 💚	compile	20m 1s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	20m 1s		root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1960 unchanged - 1 fixed = 1960 total (was 1961)
+1 💚	compile	17m 55s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	17m 55s		root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 46s		root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚	mvnsite	3m 11s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
-1 ❌	javadoc	1m 3s	/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt	hadoop-common in the patch failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚	javadoc	3m 1s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	5m 7s		the patch passed
+1 💚	shadedclient	14m 37s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 18s		hadoop-common in the patch passed.
+1 💚	unit	19m 21s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 18s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 0s		The patch does not generate ASF License warnings.
		221m 35s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux bf45287a0e3e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 5e6fafe60ce6258ff1c16505e3cb627a6ace5e0d
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/testReport/
Max. process+thread count	1388 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

steveloughran · 2021-03-22T21:52:24Z

LGTM.

I'm going to make one final change, but +1 this PR pending that change anyway.

Can you put the CommonCallableSupplier into org.apache.hadoop.util.functional . Sorry, should have thought of this earlier.

Thats where I'm trying to unify the API for functional APIs in hadoop with IOE support -and this is clearly part of it.

Nothing else, just a move of the class.

+1 pending that change.

hadoop-yetus · 2021-03-23T01:59:27Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 33s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 30s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 26s		trunk passed
+1 💚	compile	20m 54s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	17m 58s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 48s		trunk passed
+1 💚	mvnsite	3m 11s		trunk passed
-1 ❌	javadoc	1m 4s	/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt	hadoop-common in trunk failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚	javadoc	3m 0s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 32s		trunk passed
+1 💚	shadedclient	14m 38s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	15m 0s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 57s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 0s		the patch passed
+1 💚	compile	20m 10s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	20m 10s		root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚	compile	17m 55s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	17m 55s		root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 45s		root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚	mvnsite	3m 12s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
-1 ❌	javadoc	1m 3s	/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt	hadoop-common in the patch failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚	javadoc	3m 2s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	5m 7s		the patch passed
+1 💚	shadedclient	14m 41s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 16s		hadoop-common in the patch passed.
+1 💚	unit	18m 52s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 18s		hadoop-aws in the patch passed.
+1 💚	asflicense	1m 0s		The patch does not generate ASF License warnings.
		221m 53s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 3e5405531c58 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c896ae1`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/testReport/
Max. process+thread count	1389 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2021-03-23T08:39:55Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 39s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 14s		Maven dependency ordering for branch
+1 💚	mvninstall	20m 51s		trunk passed
+1 💚	compile	21m 53s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	compile	18m 22s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	checkstyle	3m 41s		trunk passed
+1 💚	mvnsite	3m 13s		trunk passed
+1 💚	javadoc	2m 21s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	3m 4s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	4m 33s		trunk passed
+1 💚	shadedclient	14m 27s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	14m 49s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 27s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 52s		the patch passed
+1 💚	compile	19m 56s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javac	19m 56s		root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚	compile	18m 56s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	javac	18m 56s		root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~~20.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 43s		root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚	mvnsite	3m 10s		the patch passed
+1 💚	xml	0m 2s		The patch has no ill-formed XML file.
+1 💚	javadoc	2m 19s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	2m 53s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚	spotbugs	5m 5s		the patch passed
+1 💚	shadedclient	14m 43s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 21s		hadoop-common in the patch passed.
+1 💚	unit	18m 44s		hadoop-distcp in the patch passed.
+1 💚	unit	2m 20s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 58s		The patch does not generate ASF License warnings.
		223m 37s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/artifact/out/Dockerfile
GITHUB PR	#2732
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 32f6d7e44352 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c896ae1`
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/testReport/
Max. process+thread count	1953 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

ayushtkn · 2021-03-23T08:47:59Z

Thanx @steveloughran for the review. I have moved CommonCallableSupplier to org.apache.hadoop.util.functional

steveloughran · 2021-03-23T18:45:12Z

OK, +1 from me. Merge to trunk and after a test run to branch-3.3; lets wait and see what surprises surface there

apache#2732).

#2808). Contributed by Ayush Saxena. * HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (#2732). * HADOOP-17531.Addendum: DistCp: Reduce memory usage on copying huge directories. (#2820) Signed-off-by: Steve Loughran <stevel@apache.org>

apache#2732). Contributed by Ayush Saxena. Signed-off-by: Steve Loughran <stevel@apache.org>

… directories. (apache#2808). Contributed by Ayush Saxena. * HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (apache#2732). * HADOOP-17531.Addendum: DistCp: Reduce memory usage on copying huge directories. (apache#2820) Signed-off-by: Steve Loughran <stevel@apache.org> Conflicts: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java (cherry picked from commit d86f94d18bd8b33cfc324b5638f12d9018c95d29) Signed-off-by: Arpit Agarwal <aagarwal@cloudera.com> Change-Id: Ieec8dbd96444dead3cd115f076a65444ca212a35

jojochuang reviewed Mar 2, 2021

View reviewed changes

ayushtkn requested review from steveloughran and jojochuang March 5, 2021 10:45

jojochuang reviewed Mar 8, 2021

View reviewed changes

steveloughran requested changes Mar 9, 2021

View reviewed changes

steveloughran requested changes Mar 10, 2021

View reviewed changes

ayushtkn force-pushed the HADOOP-17531 branch from d0e2752 to 09e938b Compare March 10, 2021 21:32

ayushtkn force-pushed the HADOOP-17531 branch 2 times, most recently from d9779f6 to 9d9a4f7 Compare March 13, 2021 23:11

apache deleted a comment from hadoop-yetus Mar 14, 2021

steveloughran reviewed Mar 15, 2021

View reviewed changes

ayushtkn force-pushed the HADOOP-17531 branch from 6f31dc6 to 9551786 Compare March 15, 2021 20:20

This comment has been minimized.

Sign in to view

ayushtkn force-pushed the HADOOP-17531 branch from 9551786 to e97d6af Compare March 15, 2021 21:47

apache deleted a comment from hadoop-yetus Mar 18, 2021

This comment has been minimized.

Sign in to view

ayushtkn force-pushed the HADOOP-17531 branch from 221210d to 5e6fafe Compare March 20, 2021 11:15

ayushtkn added 2 commits March 23, 2021 03:42

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories.

ff4ed74

Move CommonCallableSupplier.

c896ae1

ayushtkn force-pushed the HADOOP-17531 branch from 5e6fafe to c896ae1 Compare March 22, 2021 22:16

ayushtkn merged commit 03cfc85 into apache:trunk Mar 23, 2021

ayushtkn added a commit to ayushtkn/hadoop that referenced this pull request Mar 23, 2021

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (

c4a2868

apache#2732).

ayushtkn mentioned this pull request Mar 24, 2021

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (#2732). #2808

Merged

kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (

b73c982

apache#2732). Contributed by Ayush Saxena. Signed-off-by: Steve Loughran <stevel@apache.org>

		@@ -18,6 +18,7 @@

		package org.apache.hadoop.tools;

		import org.apache.hadoop.fs.RemoteIterator;

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. #2732

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. #2732

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!