-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMap support for IPC files #6709
Comments
The docs could definitely be improved, but the way to do this is to use FileDecoder with a buffer constructed from your mmap file - https://docs.rs/arrow-ipc/latest/arrow_ipc/reader/struct.FileDecoder.html |
I'll have a look these days, if it works out I'll raise a PR |
Hi, I also have the need for reading the IPC files via However, looks like the let file = std::fs::File::open("...").unwrap();
let mmap = unsafe { Mmap::map(&file).unwrap() };
let buffer = unsafe {
Buffer::from_custom_allocation(
NonNull::new_unchecked(mmap.as_ptr() as *mut u8),
mmap.len(),
Arc::new(mmap),
)
};
// ... use the buffer as suggested in the document... Nevertheless, may I still request a more structured way (like implemented in c++ library) of reading IPC file via mmap, which can be more ergnomic to use? |
@totoroyyb sounds like a good idea to figure out a better way to do this I noticed that https://docs.rs/memmap/latest/memmap/struct.Mmap.html#impl-AsRef%3C%5Bu8%5D%3E-for-Mmap So i think that means you can make a cursor over it like let cursor = std::io::Cursor::new(mmap.as_ref()); And then you can use it with a FileReader
|
This would largely defeat the purpose of using mmap, as it will perform a copy. You want to use FileDecoder as documented here with a Buffer created from the mmap region. This can be done using the recently added |
This would largely defeat the purpose of using mmap, as it will perform a copy. This code compiles and I don't think it makes a copy 🤔 Maybe I am missing something let file = std::fs::File::open("/tmp/foo.txt").unwrap();
let mmap = unsafe { Mmap::map(&file).unwrap() };
let mut cursor = std::io::Cursor::new(&mmap[..]);
let mut reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None).unwrap();
for b in reader.next().unwrap() {
println!("{:?}", b);
} |
It copies data from the mmap region into the |
I verified that this does indeed compile: let file = std::fs::File::open("/tmp/foo.txt").unwrap();
let mmap = unsafe { Mmap::map(&file).unwrap() };
let bytes = Bytes::from_owner(mmap);
let buffer = Buffer::from_bytes(bytes.into()); (and then you can use the example @tustvold mentions on For anyone following along this means the underlying arrow arrays will then (re)use the mmap region I agree an example would make this much easier to understand. |
FYI @andygrove this could be another source of improvement for comet -- avoid a copy of the spill files |
I think adding |
This is very interesting. Thanks the the ping @alamb |
I created a PR with a proposed example here: |
I think making |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We are using polars-arrow and would prefer to move to
arrow-rs
as we are looking into using Datafusion as wellDescribe the solution you'd like
We are dependent on mmap support for IPC format
Describe alternatives you've considered
keep using
polars-arrow
Additional context
It looks like Buffer and Bytes have the right functions and support for Custom dealocation
The text was updated successfully, but these errors were encountered: