Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Restructure Repo for Table Format specific modules #618

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

the-other-tim-brown
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

Brief change log

  • Creates a module per table format
  • Creates a module for the integration tests that rely on these table formats

Verify this pull request

TODO

Comment on lines +77 to +79
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark${spark.version.prefix}-bundle_${scala.binary.version}</artifactId>
<scope>test</scope>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be a problem for running tests around HFile since the Hudi Spark bundle shades dependencies, particularly HBase ones, in a different way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not if the dependencies are properly configured within the bundle, that is the main reason behind the shading. We can have different versions on the same path because they will be completely self contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true except for org.apache.hudi.io.storage.HoodieHBaseKVComparator, which is not relocated. In the Hudi Spark bundle, org.apache.hudi.io.storage.HoodieHBaseKVComparator extends from org.apache.hudi.org.apache.hadoop.hbase.CellComparatorImpl after Hudi bundle relocation. I'm wondering if that causes a clash here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a new framework for testing where I essentially execute java -cp ... style command with the bundles on the classpath to validate that the jars are not relying on any dependencies brought onto the test path by our test setup to do the conversion. This test also asserts that the results can be read back properly.

Comment on lines 245 to 247
<excludes>
<exclude>org.apache.hudi.io.storage.HoodieHBaseKVComparator</exclude>
</excludes>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to validate the conjecture above is to remove this exclusion to see if the test passes or a different error is thrown (by removing the exclusion, there should be no clash any more, and the org.apache.xtable.shade.org.apache.hudi.io.storage.HoodieHBaseKVComparator is written as the comparator in the HFile if the test table is prepared using the XTable bundle). Still, we need to figure out how to properly handle org.apache.hudi.io.storage.HoodieHBaseKVComparator as the comparator class since the fully-qualified class name needs to be preserved for reading HFile format in Hudi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants