Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use flechette for Arrow decoding. #365

Merged
merged 3 commits into from
Sep 16, 2024
Merged

feat: use flechette for Arrow decoding. #365

merged 3 commits into from
Sep 16, 2024

Conversation

jheer
Copy link
Member

@jheer jheer commented Sep 14, 2024

This PR switches from the apache-arrow reference JS implementation to instead use @uwdata/flechette for Arrow IPC binary data decoding. Flechette is smaller, faster, and provides better coverage of Arrow data types. It also provides a more correct implementation of row proxy objects (that handles nested types) than prior versions of this loader, simplifying the code in this package. As flechette's decoders are very lightweight, this package now bundles flechette so no additional Arrow imports are needed.

If this loader is passed a pre-parsed table, both flechette and apache-arrow tables are supported. Mapping from tables to row objects is now performed by calling the table toArray method. However, performance with apache-arrow tables will be degraded as that library uses very slow proxy row objects (sometimes over 10x slower than flechette).

This PR also modernizes some of the build process and testing code.

Close #364.

@domoritz domoritz changed the title Use flechette for Arrow decoding. feat: use flechette for Arrow decoding. Sep 14, 2024
Copy link
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. It's nice that we now bundle the decoder as well.

We decided not to keep the original row proxy but use the reference implementation's toArray method to convert Arrow Table objects to Vega. I think that's okay (and accept the performance deviation) because in most cases I expect people to use this library to load arrow from a URL as in https://github.com/vega/vega-loader-arrow?tab=readme-ov-file#vega-specifications.

Given the size of this library, would it make sense to bake this functionality directly into Vega? Arrow seems to be become enough of an established standard that we could consider supporting it alongside csv and json in Vega.

@jheer jheer merged commit 3bb32ba into main Sep 16, 2024
1 check passed
@jheer jheer deleted the jh/flechette branch September 16, 2024 15:09
@jheer
Copy link
Member Author

jheer commented Sep 16, 2024

Given the size of this library, would it make sense to bake this functionality directly into Vega? Arrow seems to be become enough of an established standard that we could consider supporting it alongside csv and json in Vega.

We should double check the size difference, but I think this makes a lot of sense. I'll go ahead with updating this package for now and we can consider rolling it into Vega proper as a separate task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Apache Arrow JS with Flechette?
2 participants