8000 HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. by ayushtkn · Pull Request #2732 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. #2732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 23, 2021

Conversation

ayushtkn
Copy link
Member
@ayushtkn ayushtkn commented Mar 2, 2021

@@ -362,6 +362,7 @@ Command Line Options
| `-copybuffersize <copybuffersize>` | Size of the copy buffer to use. By default, `<copybuffersize>` is set to 8192B | |
| `-xtrack <path>` | Save information about missing source files to the specified path. | This option is only valid with `-update` option. This is an experimental property and it cannot be used with `-atomic` option. |
| `-direct` | Write directly to destination paths | Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store |
| `-useIterator` | Uses single threaded listStatusIterator to build listing | Useful for saving memory at the client side. |
Copy link
Contributor
@jojochuang jojochuang Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it implicitly void -numListstatusThreads? sounds like a bad new for running distcp on cloud storage where latency is big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @jojochuang for having a look.
Yes, It indeed isn't meant for object stores, I am trying a multi threaded approach for object stores too as part of HADOOP-17558, that won't be too much memory efficient, but still find a balance between speed and memory. I have a WIP patch for that as well, will share that on the jira

This is basically for HDFS or FS where listing is not slow, but there are memory constraints, my scenario is basically for DR, where it is in general HDFS->HDFS or HDFS->S3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was think we shouuld update the doc to mention it will disable -numListstatusThreads. But if we can merge that WIP patch soon then it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the document, Let me know if something more can be improved.

}

@SuppressWarnings("checkstyle:parameternumber")
private void prepareListing(Path path, SequenceFile.Writer fileListWriter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for coming back late.
Can we refactor this method a bit to use fewer parameters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should refactor traverseDirectory() into a class since we pass over the parameters here and there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Refactored TraverseDirectory to a class

@@ -362,6 +362,7 @@ Command Line Options
| `-copybuffersize <copybuffersize>` | Size of the copy buffer to use. By default, `<copybuffersize>` is set to 8192B | |
| `-xtrack <path>` | Save information about missing source files to the specified path. | This option is only valid with `-update` option. This is an experimental property and it cannot be used with `-atomic` option. |
| `-direct` | Write directly to destination paths | Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store |
| `-useIterator` | Uses single threaded listStatusIterator to build listing | Useful for saving memory at the client side. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was think we shouuld update the doc to mention it will disable -numListstatusThreads. But if we can merge that WIP patch soon then it's fine.

@ayushtkn
Copy link
Member Author
ayushtkn commented Mar 9, 2021

Thanx @jojochuang for the review, I have addressed the review comments, Please have a look. :-)

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added suggestions about

  • logging duration of list
  • logging IOSTats of iterator
  • moving the test to the AbstractContractDistCpTest and so tested by the object stores (note: That test isn't executed by HDFS)

+ " target location, avoiding temporary file rename.")),

USE_ITERATOR(DistCpConstants.CONF_LABEL_USE_ITERATOR,
new Option("useIterator", false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have some non-mixed-case arg; I always get confused here?

@@ -18,6 +18,7 @@

package org.apache.hadoop.tools;

import org.apache.hadoop.fs.RemoteIterator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to go into the "real hadoop imports" block; your IDE is getting confused. Putting it in the right place makes backporting waay easier


public void traverseDirectoryMultiThreaded() throws IOException {
assert numListstatusThreads > 0;
if (LOG.isDebugEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can go to slf4j logging here; this is all commons-logging era (distcp lagged)

if (workResult.getSuccess()) {
LinkedList<CopyListingFileStatus> childCopyListingStatus =
DistCpUtils.toCopyListingFileStatus(sourceFS, child,
preserveAcls && child.isDirectory(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child.isDirectory() is called enough it could go into a variable

LOG.error("Could not get item from childQueue. Retrying...");
}
}
workers.shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in a finally clause?

prepareListing(child.getPath());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an IOStatisticsLogging.logIOStatisticsAtDebug(LOG, listStatus) call here. That way at debug level you get a log from s3a, soon abfs of what IO took place for the list, performance etc. Really interesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeps, that is something cool, I extracted a part of it to hadoop-common, let me know if you have objections doing that, well I wanted to move whole of it to common, just left it because of the class CallableSupplier, I thought moving this might cause some incompatibility problems, as this was being used in the prod code as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy for you to take what's merged up. That CallableSupplier can be moved if you need to

fs.mkdirs(source);
// Create 10 dirs inside.
for (int i = 0; i < 10; i++) {
fs.mkdirs(new Path("/src/sub" + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip this and just delegate to the children; saves 10 calls

for (int k = 0; k < 10; k++) {
Path parentPath = new Path("/src/sub" + i + "/subsub" + j);
Path filePath = new Path(parentPath, "file" + k);
DFSTestUtil.createFile(fs, filePath, 1024L, (short) 3, 1024L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you can go straight to the createfile, without doing any mkdirs. Still going to take 10^3 calls on an object store though. If you do move something of this size there then

  1. the create files should be done in an executor pool (see ITestPartialRenamesDeletes.createDirsAndFiles())
  2. parameters should be something configurable, just a subclass getWidth() would be enough


// Check that all 1000 files got copied.
RemoteIterator<LocatedFileStatus> destFileItr = fs.listFiles(dest, true);
int numFiles = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions.assertThat(RemoteIterators.toList(fs.listFiles(dest, true)))
 .describedAs("files").hasSize(...)

that way: if the size isn't met, the error includes the list of all files which were found.

@ayushtkn
Copy link
Member Author

Thanx @steveloughran for the review. I have addressed the review comments. Please have a look

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor comments. Have you had a chance to test the s3a distcp client through this yet?

import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListenableFuture;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent trunk changes #2522 will have broken this; just use direct references to BlockingThreadPoolExecutorService

@@ -64,12 +71,22 @@

import org.apache.hadoop.thirdparty.com.google.common.base.Joiner;
import org.apache.hadoop.thirdparty.com.google.common.collect.Sets;
import org.slf4j.LoggerFactory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably needs to go somewhere else in the imports

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.protocol.SnapshotDiffReport;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, these imports we are trying to leave up where they were. Because when cherrypicking we're trying to stay on the older versions. it's a PITA, I know

prepareListing(child.getPath());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy for you to take what's merged up. That CallableSupplier can be moved if you need to

@steveloughran
Copy link
Contributor

@ayushtkn
Copy link
Member Author

Thanx @steveloughran for the review, I have addressed the comments. Please give a check.
Regarding S3A I couldn't check this there, not very experienced in that domain as well :-(, And folks since start of this were like this is gonna be drastic for object stores, so I marked myself a new task to coin something different for data migration from Cloud to on prem,

@ayushtkn ayushtkn force-pushed the HADOOP-17531 branch 2 times, most recently from d9779f6 to 9d9a4f7 Compare March 13, 2021 23:11
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 22s Maven dependency ordering for branch
+1 💚 mvninstall 20m 17s trunk passed
+1 💚 compile 20m 41s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 17m 58s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 48s trunk passed
+1 💚 mvnsite 3m 12s trunk passed
+1 💚 javadoc 2m 23s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 3m 3s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 37s trunk passed
+1 💚 shadedclient 14m 32s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
+1 💚 mvninstall 1m 52s the patch passed
+1 💚 compile 19m 57s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 19m 57s the patch passed
+1 💚 compile 17m 56s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 17m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 42s root: The patch generated 0 new + 113 unchanged - 5 fixed = 113 total (was 118)
+1 💚 mvnsite 3m 10s the patch passed
+1 💚 javadoc 2m 20s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 36s hadoop-common in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚 javadoc 0m 39s hadoop-distcp in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚 javadoc 0m 44s hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 85 unchanged - 3 fixed = 85 total (was 88)
+1 💚 spotbugs 5m 7s the patch passed
+1 💚 shadedclient 14m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 21s hadoop-common in the patch passed.
-1 ❌ unit 18m 34s /patch-unit-hadoop-tools_hadoop-distcp.txt hadoop-distcp in the patch passed.
+1 💚 unit 2m 18s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
220m 33s
Reason Tests
Failed junit tests hadoop.tools.TestDistCpSystem
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux ce79958dd934 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9d9a4f7d4cabccb0075026ef175af4e82a1ef682
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/testReport/
Max. process+thread count 3152 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/18/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Mar 14, 2021
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 8 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 13s Maven dependency ordering for branch
+1 💚 mvninstall 20m 13s trunk passed
+1 💚 compile 20m 39s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 17m 50s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 45s trunk passed
+1 💚 mvnsite 3m 14s trunk passed
+1 💚 javadoc 2m 22s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 3m 3s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 36s trunk passed
+1 💚 shadedclient 14m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 52s the patch passed
+1 💚 compile 19m 58s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 19m 58s root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚 compile 18m 1s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 18m 1s root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 41s root: The patch generated 0 new + 124 unchanged - 5 fixed = 124 total (was 129)
+1 💚 mvnsite 3m 8s the patch passed
+1 💚 javadoc 2m 20s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 41s hadoop-common in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚 javadoc 0m 38s hadoop-distcp in the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.
+1 💚 javadoc 0m 45s hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 85 unchanged - 3 fixed = 85 total (was 88)
+1 💚 spotbugs 5m 11s the patch passed
+1 💚 shadedclient 17m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 18m 29s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 unit 28m 32s hadoop-distcp in the patch passed.
+1 💚 unit 2m 38s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
234m 30s
Reason Tests
Failed junit tests hadoop.metrics2.source.TestJvmMetrics
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux e6eaa913a411 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d0b8e147d80ab79e999e52813297e43d10b8b1b5
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/testReport/
Max. process+thread count 1391 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/19/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@ayushtkn
Copy link
Member Author
ayushtkn commented Mar 14, 2021

Have removed the timeout from TestDistCpSystem it failed with a timeout error in my latest build, where I just removed some unused import from the aws package and with the checkstyle it passed in 15 secs only which was half the timeout, the test doesn't goes through my part of code, and I am pretty sure DurationInfo can't be that costly. In fact the test timeout out before reaching the distCp stuff, Not sure, what happened.

Created a test PR without any changes, Still TestDistCpSystem failed

#2773 (comment)

The recent test failure isn't related.
Please help review!!!

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, last failure is an OOM, so unrelated

@@ -53,8 +53,10 @@
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ListenableFuture;
import org.apache.hadoop.thirdparty.com.google.common.base.Charsets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to go into the "other imports" block. Yes, it's a PITA, but its goal is to manage backporting. And guess what I have to do a lot of?

@steveloughran
Copy link
Contributor

codewise, all looks good and tests seem spurious. I think what I'd like to do now is checkout and tests the s3a and abfs distp suites on my system

@steveloughran
Copy link
Contributor

S3A Test failure

[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[ERROR] Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 519.266 s <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[ERROR] testDistCpWithIterator(org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp)  Time elapsed: 376.629 s  <<< ERROR!
java.lang.IllegalArgumentException: Wrong FS file://null//Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-aws/target/test-dir/ITestS3AContractDistCp/testDistCpWithIterator/local/dest -expected s3a://stevel-london
	at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1153)
	at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:665)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1115)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1141)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4396)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

@steveloughran
Copy link
Contributor

And abfs

cureDistCp
[ERROR] testDistCpWithIterator(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp)  Time elapsed: 504.938 s  <<< ERROR!
java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, GET, https://stevelukwest.dfs.core.windows.net/stevel-testing?upn=false&resource=filesystem&maxResults=5000&directory=Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-azure/target/test-dir/ITestAbfsFileSystemContractSecureDistCp/testDistCpWithIterator/local/dest&timeout=90&recursive=false, PathNotFound, "The specified path does not exist. RequestId:45fc14b7-f01f-002c-2db9-19d834000000 Time:2021-03-15T16:34:00.9374305Z"
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1177)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:407)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1971)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2013)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2179)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2178)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2161)
	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2287)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2284)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: Operation failed: "The specified path does not exist.", 404, GET, https://stevelukwest.dfs.core.windows.net/stevel-testing?upn=false&resource=filesystem&maxResults=5000&directory=Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-azure/target/test-dir/ITestAbfsFileSystemContractSecureDistCp/testDistCpWithIterator/local/dest&timeout=90&recursive=false, PathNotFound, "The specified path does not exist. RequestId:45fc14b7-f01f-002c-2db9-19d834000000 Time:2021-03-15T16:34:00.9374305Z"
	at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:199)
	at org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:229)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:907)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:877)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:859)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:404)
	... 24 more

[INFO]
[INFO] Results:

So -1 on the tests there. And ideally, it's time to see if you can sort out some AWS and/or azure credentials. Even testing against a minio docker image would be a good initial starting point

@ayushtkn
Copy link
Member Author

Thanx @steveloughran for trying that out. I will figure out a way to run that UT.
I found a doc:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html#Supporting_FileSystems_with_login_and_authentication_parameters

I will sort out the cred stuff and try following this doc, let me know if this isn't the best or correct doc to follow.

I added a HDFS contract test as well and that fetched me a same exception as S3A:


java.lang.IllegalArgumentException: Wrong FS: file:/Users/ayushsaxena/code/hadoop-code/osCode/hadoop/hadoop-tools/hadoop-distcp/target/test-dir/TestHDFSContractDistCp/testDistCpWithIterator/local/dest, expected: hdfs://localhost:58099

	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:806)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:257)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1272)
	at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1259)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1204)
	at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1200)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1218)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2161)
	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2287)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2284)
	at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testDistCpWithIterator(AbstractContractDistCpTest.java:622)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

I fixed it, So I suppose S3A should work, But the ABFS stuff I need to check, I feel it doesn't have a filesystem check (checkPath), but I will figure out. Thanx!!!

@steveloughran
Copy link
Contributor

Hey, I've realised that my auditing PR #2675 is going to clash on CallableSupplier changes as its taking auditing spans in, activating them before an after.

Can you restore the S3A callable stuff, alongside the copy you've made in fs.impl? That will stop the PRs conflicting. Thanks

@ayushtkn
Copy link
Member Author

Can you restore the S3A callable stuff, alongside the copy you've made in fs.impl? That will stop the PRs conflicting.

Do you mean to say revert all the aws changes? and move these classes back to the aws module? To a state before this commit:
1d27dfc

That shouldn't be a problem, just to know will it be OK if I keep a parent class in the Hadoop-Common and child class named CallableSupplier in aws module can that also work for you?

@steveloughran
Copy link
Contributor

Problem is this: https://github.com/steveloughran/hadoop/blob/s3/HADOOP-17511-auditing/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CallableSupplier.java

I'm wrapping each of the ops in an enter/exit of auditing, the changes are pretty traumatic. Unless I can rework how the invocation happens, we'll need to keep them separate.

Here's what I'd like to propose

  • your patches copies, rather than moves callable supplier. Maybe give it a new name to distinguish it from the one in s3
  • the old one stays in aws s3, same name etc.

Your patch can go in to trunk and I can co-exist my dev with it. If I can see a way to move I'll adopt, but it will allow us to diverge, with the hadoop common CallableSupplier more broadly used

@steveloughran
Copy link
Contributor

I added a HDFS contract test as well and that fetched me a same exception as S3A:

The only reason we don't have one of those already is that it slowed down HDFS test runs and all the other distcp tests used mini HDFS clusters. But we clearly need it to regression test things

Maybe we should add it, but give it a different name from Test*, (and comment in the parent class) so that its only run when explicitly asked for. Some of the S3 tests are like that. Test cases you have to run by hand or from the IDE -but which ma 10000 ven skips

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 19s Maven dependency ordering for branch
+1 💚 mvninstall 21m 25s trunk passed
+1 💚 compile 23m 5s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 19m 22s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 57s trunk passed
+1 💚 mvnsite 3m 4s trunk passed
+1 💚 javadoc 2m 11s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 50s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 34s trunk passed
+1 💚 shadedclient 15m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 58s the patch passed
+1 💚 compile 21m 22s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 21m 22s root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚 compile 18m 48s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 18m 48s root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 16s root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚 mvnsite 3m 14s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 2m 20s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 59s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 5m 7s the patch passed
+1 💚 shadedclient 14m 50s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 14s hadoop-common in the patch passed.
+1 💚 unit 19m 4s hadoop-distcp in the patch passed.
+1 💚 unit 2m 14s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
229m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 2425df3a5e0e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e97d6afc36cead0f34fb957d630b5141334e7b68
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/testReport/
Max. process+thread count 3152 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/22/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@ayushtkn
Copy link
Member Author

Thanx @steveloughran for the review.
Sorted the S3 stuff, The test passes:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 679.069 s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11:27 min
[INFO] Finished at: 2021-03-16T11:49:50+05:30
[INFO] ------------------------------------------------------------------------
ayushsaxena@ayushsaxena-MBP16 hadoop-aws % 

Region: ap-south-1

The newly added HDFS contract:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.tools.contract.OptionalTestHDFSContractDistCp
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 257.242 s - in org.apache.hadoop.tools.contract.OptionalTestHDFSContractDistCp
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:26 min
[INFO] Finished at: 2021-03-16T11:57:21+05:30
[INFO] ------------------------------------------------------------------------

@ayushtkn
Copy link
Member Author

@steveloughran any further comments?

@apache apache deleted a comment from hadoop-yetus Mar 18, 2021
@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 31s Maven dependency ordering for branch
+1 💚 mvninstall 20m 21s trunk passed
+1 💚 compile 20m 50s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 18m 4s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 46s trunk passed
+1 💚 mvnsite 3m 12s trunk passed
-1 ❌ javadoc 1m 1s /branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in trunk failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚 javadoc 3m 1s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 33s trunk passed
+1 💚 shadedclient 14m 38s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
+1 💚 mvninstall 1m 50s the patch passed
+1 💚 compile 20m 1s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 20m 1s root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1960 unchanged - 1 fixed = 1960 total (was 1961)
+1 💚 compile 17m 55s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 17m 55s root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 46s root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚 mvnsite 3m 11s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
-1 ❌ javadoc 1m 3s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚 javadoc 3m 1s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 5m 7s the patch passed
+1 💚 shadedclient 14m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 18s hadoop-common in the patch passed.
+1 💚 unit 19m 21s hadoop-distcp in the patch passed.
+1 💚 unit 2m 18s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
221m 35s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux bf45287a0e3e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5e6fafe60ce6258ff1c16505e3cb627a6ace5e0d
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/testReport/
Max. process+thread count 1388 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/26/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

LGTM.

I'm going to make one final change, but +1 this PR pending that change anyway.

Can you put the CommonCallableSupplier into org.apache.hadoop.util.functional . Sorry, should have thought of this earlier.

Thats where I'm trying to unify the API for functional APIs in hadoop with IOE support -and this is clearly part of it.

Nothing else, just a move of the class.

+1 pending that change.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 30s Maven dependency ordering for branch
+1 💚 mvninstall 20m 26s trunk passed
+1 💚 compile 20m 54s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 17m 58s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 48s trunk passed
+1 💚 mvnsite 3m 11s trunk passed
-1 ❌ javadoc 1m 4s /branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in trunk failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚 javadoc 3m 0s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 32s trunk passed
+1 💚 shadedclient 14m 38s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 15m 0s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 57s Maven dependency ordering for patch
+1 💚 mvninstall 2m 0s the patch passed
+1 💚 compile 20m 10s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 20m 10s root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚 compile 17m 55s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 17m 55s root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 45s root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚 mvnsite 3m 12s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
-1 ❌ javadoc 1m 3s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.
+1 💚 javadoc 3m 2s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 5m 7s the patch passed
+1 💚 shadedclient 14m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 16s hadoop-common in the patch passed.
+1 💚 unit 18m 52s hadoop-distcp in the patch passed.
+1 💚 unit 2m 18s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
221m 53s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 3e5405531c58 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c896ae1
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/testReport/
Max. process+thread count 1389 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/27/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 14s Maven dependency ordering for branch
+1 💚 mvninstall 20m 51s trunk passed
+1 💚 compile 21m 53s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 18m 22s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 41s trunk passed
+1 💚 mvnsite 3m 13s trunk passed
+1 💚 javadoc 2m 21s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 3m 4s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 4m 33s trunk passed
+1 💚 shadedclient 14m 27s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 14m 49s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 52s the patch passed
+1 💚 compile 19m 56s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 19m 56s root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1955 unchanged - 1 fixed = 1955 total (was 1956)
+1 💚 compile 18m 56s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 18m 56s root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu120.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu120.04-b08 generated 0 new + 1851 unchanged - 1 fixed = 1851 total (was 1852)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 43s root: The patch generated 0 new + 93 unchanged - 5 fixed = 93 total (was 98)
+1 💚 mvnsite 3m 10s the patch passed
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 2m 19s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 53s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 5m 5s the patch passed
+1 💚 shadedclient 14m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 21s hadoop-common in the patch passed.
+1 💚 unit 18m 44s hadoop-distcp in the patch passed.
+1 💚 unit 2m 20s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 58s The patch does not generate ASF License warnings.
223m 37s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/artifact/out/Dockerfile
GITHUB PR #2732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 32f6d7e44352 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c896ae1
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/testReport/
Max. process+thread count 1953 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-distcp hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2732/28/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@ayushtkn
Copy link
Member Author

Thanx @steveloughran for the review. I have moved CommonCallableSupplier to org.apache.hadoop.util.functional

@steveloughran
Copy link
Contributor

OK, +1 from me. Merge to trunk and after a test run to branch-3.3; lets wait and see what surprises surface there

@ayushtkn ayushtkn merged commit 03cfc85 into apache:trunk Mar 23, 2021
ayushtkn added a commit to ayushtkn/hadoop that referenced this pull request Mar 23, 2021
ayushtkn added a commit that referenced this pull request Mar 27, 2021
#2808). Contributed by Ayush Saxena.

* HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (#2732).

* HADOOP-17531.Addendum: DistCp: Reduce memory usage on copying huge directories. (#2820)

Signed-off-by: Steve Loughran <stevel@apache.org>
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
apache#2732). Contributed by Ayush Saxena.

Signed-off-by: Steve Loughran <stevel@apache.org>
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
… directories. (apache#2808). Contributed by Ayush Saxena.

* HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (apache#2732).

* HADOOP-17531.Addendum: DistCp: Reduce memory usage on copying huge directories. (apache#2820)

Signed-off-by: Steve Loughran <stevel@apache.org>
 Conflicts:
	hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
	hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java
(cherry picked from commit d86f94d18bd8b33cfc324b5638f12d9018c95d29)
Signed-off-by: Arpit Agarwal <aagarwal@cloudera.com>

Change-Id: Ieec8dbd96444dead3cd115f076a65444ca212a35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0