Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use u64 range instead of usize, for better wasm32 support #6961

Merged
merged 8 commits into from
Jan 17, 2025

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jan 10, 2025

Which issue does this PR close?

Rationale for this change

Ranges should be u64 so that 32 bit platforms can read files larger than 4GB. More details can be found in #5351

What changes are included in this PR?

This is a rather simple change. No functionality change for 64 bit platforms. But is a breaking change for trait implementors.
Given that we already break one in #6619, it's seems like a good timing to also include this change.

This PR added and removed a few casting. I have checked the casting is ok, but please help me check again.

My principle of using usize vs u64:

  • Anything already in memory should use usize, e.g., slicing a memory range
  • O.w. use u64

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Jan 10, 2025
@alamb alamb added the api-change Changes to the arrow API label Jan 10, 2025
@alamb
Copy link
Contributor

alamb commented Jan 10, 2025

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XiangpengHao ❤️

Can you help me understand what currently doesn't work with 32 bit builds now?

We have an existing test for WASM32 that seems to work fine: https://github.com/apache/arrow-rs/actions/runs/12680096825/job/35341184364

I also locally tried checking with an i686 target and that worked fine too 🤔

rustup target add i686-unknown-linux-gnu
cd object_store
cargo check --target=i686-unknown-linux-gnu
...

🤔

@alamb
Copy link
Contributor

alamb commented Jan 10, 2025

🤔 maybe it is related to wanting to use the http features?

But it still seems to compile just fine for me 🤔

sudo apt-get install gcc-multilib
rustup target add i686-unknown-linux-gnu
cd object_store
cargo check --target=i686-unknown-linux-gnu --features=http
...

@alamb
Copy link
Contributor

alamb commented Jan 11, 2025

Can you help me understand what currently doesn't work with 32 bit builds now?

For anyone following along, the answer is here: #5351 (comment)

I still think we should try and add some sort of test that would fail on wasm32 before this change and not after the change. If we don't do that I feel like:

  1. We are less sure we fixed all the places that require change
  2. We may inadvertently introduce regressions but no test would fail

I'll see if I can find some time over the next day or two to help, if no on ebeats me to it

@XiangpengHao
Copy link
Contributor Author

add some sort of test that would fail on wasm32

Thanks for the review @alamb , I agree with the rationale but unfortunately we don't yet have the infra to test wasm32 yet. We currently only build on wasm32, but there's no runner that can actually execute the webassembly

@XiangpengHao XiangpengHao mentioned this pull request Jan 13, 2025
3 tasks
@Xuanwo
Copy link
Member

Xuanwo commented Jan 14, 2025

Thanks for the review @alamb , I agree with the rationale but unfortunately we don't yet have the infra to test wasm32 yet. We currently only build on wasm32, but there's no runner that can actually execute the webassembly

OpenDAL has an edge test for this; it's worth taking a look.

https://github.com/apache/opendal/blob/781682cc05b31d1f12f91fb7568a868f038c4d3b/.github/workflows/test_edge.yml#L62-L107

The test code could be found here: https://github.com/apache/opendal/tree/main/core/edge/s3_read_on_wasm

@alamb alamb changed the title Switch to use u64 range instead of usize, for better wasm32 support Use u64 range instead of usize, for better wasm32 support Jan 14, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao -- I reviewed this PR carefully and I think the API changes are good but there are a few places where going from u64 to usize should be checked (as it could truncate on WASM).

I may be mistaken (my casting knowledge is mostly left over from C/C++ 😅 )

I also have a few other suggestions related to comments, but otherwise this is good to go from my perspectivel

Thank you again for helping push this along

object_store/src/azure/client.rs Show resolved Hide resolved
object_store/src/chunked.rs Outdated Show resolved Hide resolved
object_store/src/lib.rs Outdated Show resolved Hide resolved
object_store/src/lib.rs Show resolved Hide resolved
object_store/src/lib.rs Outdated Show resolved Hide resolved
object_store/src/local.rs Outdated Show resolved Hide resolved
object_store/src/memory.rs Outdated Show resolved Hide resolved
object_store/src/util.rs Show resolved Hide resolved
@@ -332,7 +333,9 @@ mod tests {
&ranges,
|range| {
fetches.push(range.clone());
futures::future::ready(Ok(Bytes::from(src[range].to_vec())))
futures::future::ready(Ok(Bytes::from(
src[range.start as usize..range.end as usize].to_vec(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I think we need to check the conversion here rather than truncating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't return error, I just unwrap here, as it's only a test

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

BTW I merged up from main. I also think the obejct_store emulator tests are unrelated to this PR. I have filed

To track

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

Thanks for the review @alamb , I agree with the rationale but unfortunately we don't yet have the infra to test wasm32 yet. We currently only build on wasm32, but there's no runner that can actually execute the webassembly

OpenDAL has an edge test for this; it's worth taking a look.

https://github.com/apache/opendal/blob/781682cc05b31d1f12f91fb7568a868f038c4d3b/.github/workflows/test_edge.yml#L62-L107

The test code could be found here: https://github.com/apache/opendal/tree/main/core/edge/s3_read_on_wasm

Thank you @Xuanwo -- I didn't see this response before posting my update

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

I restarted the emulator test to see if it passes:
Old: https://github.com/apache/arrow-rs/actions/runs/12770295148/job/35594807418?pr=6961

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

Now the emulator test is passing after 2 retries: https://github.com/apache/arrow-rs/actions/runs/12770295148/job/35607985446?pr=6961

🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XiangpengHao -- this looks great to me

@alamb alamb merged commit 9cd7018 into apache:main Jan 17, 2025
14 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 17, 2025

Thanks again

totoroyyb pushed a commit to totoroyyb/arrow-rs that referenced this pull request Jan 20, 2025
…#6961)

* u64 ranges

* more u64

* make clippy happy

* even more u64

* Update object_store/src/lib.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update object_store/src/lib.rs

Co-authored-by: Andrew Lamb <[email protected]>

* address comments

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 32-bit Architectures / Replace usize with u64
4 participants