8000 HDFS-16911. Distcp with snapshot diff to support Ozone filesystem. by sadanand48 · Pull Request #5364 · apache/hadoop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

HDFS-16911. Distcp with snapshot diff to support Ozone filesystem. #5364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 10, 2023

Conversation

sadanand48
Copy link
Contributor
@sadanand48 sadanand48 commented Feb 7, 2023

Description of PR

Currently in DistcpSync i.e the step which applies the diff b/w 2 provided snapshots as arguments to the distcp job with -diff option, only DistributedFilesystem and WebHDFS filesystems are supported.
Now that Ozone supports snapshots, the change here is to make DistcpSync also support OzoneFilesystem (ofs).

The change checks if the filesystem suports fs.hasPathCapability for snapshots and any HCFS that supports snapshots can override hasPathCapability and return true if it does.

Jira :https://issues.apache.org/jira/browse/HDFS-16911

How was this patch tested?

Unit tests
Tested by building hadoop code and replacing distcp jars in Ozone via Unit tests in Ozone.
Also tested on a cluster.

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • all new methods and fields need a javadoc
  • what new tests do you propose? the simple one is "how to show the dynamic binding is used"

@hadoop-yetus

This comment was marked as outdated.

@sadanand48 sadanand48 marked this pull request as ready for review March 14, 2023 12:45
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to comment on the snapshot code itself.

I'm worried that removing a constant and the ability to set a new sync class will break something.

Do you know if anyone has actually used this feature?

// can expect other exceptions as source and target paths
// are not created, assert if the exception is not arising
// due to the filesystem not supporting snapshots.
Assert.assertFalse(e.getMessage().contains("does not support snapshot"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you expecting an exception here? if so, intercept() has an overload where you declare text to look for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm checking that a particular exception i.e exception stating that 'fs doesn't support snapshots' is not thrown, The intercept utility method checks whether the caught exception has a particular text, I think there is no util method to check its negation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.
Use GenericTestUtils.assertExceptionContains(does not support snapshot", e) so on a failure, the unexpected exception message and stack is not lost. whoever has to debug a test failure from an html test report will love you for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing it to throw UnsupportedOperationException it became easier, The failure scenario for this test case is only if it gets UnsupportedException. It can expect other exceptions.

catch (UnsupportedOperationException e) {
      Assert.fail("Dummy FS supports snapshots," +
          "did not expect UnsupportedOperationException");
    } catch (Exception e) {
      // can expect other exceptions as source and target paths
      // are not created.
    }

@sadanand48
Copy link
Contributor Author

I'm worried that removing a constant and the ability to set a new sync class will break something.

Thanks @steveloughran for reviewing this PR. This constant is not present in the hadoop code but was added as part of a previous commit of this same PR, I'm no longer using that approach of setting the distcp.sync.class that I proposed earlier , instead just assigning the correct filesystem dynamically at run time did the job for me.

@steveloughran
Copy link
Contributor

thanks for the update; nothing to worry about. I was just concerned if some app was using distcp as a library (hive,...) and this would have broken it.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

}
try {
getSnapshotDiffReportMethod(srcFs);
} catch (NoSuchMethodException e) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to UnsupportedOperationException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -161,20 +161,32 @@ private void checkFilesystemSupport(Path sourceDir, Path targetDir,
if (!srcFs.hasPathCapability(sourceDir,
CommonPathCapabilities.FS_SNAPSHOTS)) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsupportedOperationException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1305,7 +1305,7 @@ public void testSyncSnapshotDiffWithDummyFileSystem() throws Exception {
new DistCp(conf, OptionsParser.parse(args)).execute();
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. catch UnsupportedOperationException after tightening the exception raised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// can expect other exceptions as source and target paths
// are not created, assert if the exception is not arising
// due to the filesystem not supporting snapshots.
Assert.assertFalse(e.getMessage().contains("does not support snapshot"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.
Use GenericTestUtils.assertExceptionContains(does not support snapshot", e) so on a failure, the unexpected exception message and stack is not lost. whoever has to debug a test failure from an html test report will love you for this

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@@ -1303,12 +1303,13 @@ public void testSyncSnapshotDiffWithDummyFileSystem() throws Exception {
FileSystem dummyFs = FileSystem.get(URI.create("dummy:///"), conf);
Assert.assertTrue(dummyFs instanceof DummyFs);
new DistCp(conf, OptionsParser.parse(args)).execute();
} catch (UnsupportedOperationException e) {
Assert.fail("Dummy FS supports snapshots," +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just rethrow e here, then the caller gets a full stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"dummy:///target" };
try {
FileSystem dummyFs = FileSystem.get(URI.create("dummy:///"), conf);
Assert.assertTrue(dummyFs instanceof DummyFs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very informative. I'd prefer an assertJ assertion that the class is an instance of,

assertThat(dummyfs).isInstanceOf()...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@hadoop-yetus

This comment was marked as outdated.

fs, snapshotDir, fromSnapshot, toSnapshot);
} catch (InvocationTargetException e) {
throw new IOException(e.getCause());
} catch (NoSuchMethodException|IllegalAccessException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoSuchMethod sanity check already covered looks like before calling this API. IllegalAccess should noyt happen ideally. So runtime should be ok.
Tiny Nit: Message: "Failed to ...."

Copy link
Contributor Author
@sadanand48 sadanand48 Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @umamaheswararao for the review.

Done. Yes the exception will likely be detected earlier but these are checked exceptions so it requires either to define in catch block or throws.

@umamaheswararao
Copy link
Contributor
umamaheswararao commented Mar 24, 2023

This patch looks good to me. If @steveloughran agree that his comments addressed, we can proceed to commit. Thanks
Please take care of nit.

@hadoop-yetus

This comment was marked as outdated.

@smengcl smengcl requested a review from steveloughran April 4, 2023 21:02
@umamaheswararao
Copy link
Contributor

If no other comments, i will proceed to merge this. Thanks

@umamaheswararao umamaheswararao merged commit 74ddf69 into apache:trunk Apr 10, 2023
sadanand48 added a commit to sadanand48/hadoop that referenced this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0