Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalize_entry uses path and FD that may diverge, could use just FD #1786

Closed
EliahKagan opened this issue Jan 20, 2025 · 4 comments · Fixed by #1803
Closed

finalize_entry uses path and FD that may diverge, could use just FD #1786

EliahKagan opened this issue Jan 20, 2025 · 4 comments · Fixed by #1803
Labels
acknowledged an issue is accepted as shortcoming to be fixed

Comments

@EliahKagan
Copy link
Member

Current behavior 😯

When finishing checking out a file, we do this:

// For possibly existing, overwritten files, we must change the file mode explicitly.
#[cfg(unix)]
if let Some(path) = set_executable_after_creation {
let old_perm = std::fs::symlink_metadata(path)?.permissions();
if let Some(new_perm) = set_mode_executable(old_perm) {
std::fs::set_permissions(path, new_perm)?;
}
}
// NOTE: we don't call `file.sync_all()` here knowing that some filesystems don't handle this well.
// revisit this once there is a bug to fix.
entry.stat = Stat::from_fs(&gix_index::fs::Metadata::from_file(&file)?)?;
file.close()?;

This has both the path and an open file object, the latter of which on a Unix-like system has an associated open file descriptor. The path is used to set permissions on the file when it is being made executable (and when this was not done eagerly when creating the file anew). But the File object is used to access metadata and update the index. The metadata it accesses does not include the permissions, so this does not inherently produce any inconsistencies (and the commented note is not likely related to permissions). But this combination is nonetheless unintuitive, especially if the file is moved or replaced:

  • We set permissions on a path that may, in rare cases, be the path to a different file that has replaced this one. See be0abaf (#1764) and d674a43 (here, #1779).
  • The index gets information from the open file descriptor, which if I understand correctly applies to the original open file even it is no longer where it was originally and even if something else is there now.

As noted in #1783 ("Expected behavior", 5(iii)), we should be able to set the permissions through the open file descriptor (from the File object) rather than using the path:

File permissions on a Unix-like system can be set through an open file descriptor, without using the path. (If an OS is not Unix-like then we are not setting +x anyway.) POSIX standardizes fchmod for this purpose.

I believe gix_index::fs, used there, already depends on rustix where cfg!(unix). One way to call fchmod is through rustix::fs::fchmod. (It does not support file descriptors opened with the O_PATH flag, but I believe that is inapplicable here, because that's only useful when the file contents are not to be accessed in any way, whereas in contrast the file descriptor associated with our File object represents a file that is open in the full traditional sense and that has, presumably, just been written to.)

Expected behavior 🤔

I am not sure what the expected behavior is, because I don't know if there is a specific reason for timestamp and size metadata to be read through the file descriptor that was presumably used for writing, while using the original path to set permissions instead of using fchmod.

  • If this distinction supports valuable semantics, then it should be kept, and the only bug is that nothing explains that this is intentional. A comment would fix it. Depending on the reason and how important it is, a documentation comment on a function might be warranted.
  • If this distinction does not support valuable semantics, then fchmod (possibly through a higher-level library like rustix) should be used. Unless there is a separate reason not to do that.

This may also eventually no longer be relevant: Setting permissions might stop being needed at all if files that don't have the target permissions are replaced instead of having their permissions changed.

Git behavior

I'm not sure, in full. However, as noted in #1783 and #1784, Git replaces files during checkout (deleting/unlinking them and creating a new file of the same), rather than reusing them (keeping them but overwriting their contents), in more circumstances than gitoxide does. So I wouldn't expect the choice between chmod and fchmod to arise as often in the Git checkout implementation.

Steps to reproduce 🕹

For the aspect of this that is about it being unclear why one operation uses the open file and the other uses the path, see above links.

For the aspect that speculates that this could worsen some race conditions, I don't have steps to reproduce that. I don't know how much of a problem it is and, as noted above, it may be that the current code is correct except for being under-documented with respect to why this disparity is actually useful.

@EliahKagan EliahKagan changed the title finalize_entry makes seemingly unnecessary use of path and FD that may diverge finalize_entry uses path and FD that may diverge, could use just FD Jan 20, 2025
@Byron
Copy link
Member

Byron commented Jan 24, 2025

Thanks a lot for pointing this out!

I am not sure what the expected behavior is, because I don't know if there is a specific reason for timestamp and size metadata to be read through the file descriptor that was presumably used for writing, while using the original path to set permissions instead of using fchmod.

I think this is done solely because I wasn't aware of a 'path-less' way of doing this, and agree that rustix could and should be used (a crate which already is a dependency).

A proper reset is a topic I will work on as the next big feature, and resolving the existing issues will be done first.

@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Jan 24, 2025
@EliahKagan
Copy link
Member Author

I think this is done solely because I wasn't aware of a 'path-less' way of doing this, and agree that rustix could and should be used (a crate which already is a dependency).

I have a fix for this issue almost ready. It uses fstat and fchmod to read and set the mode via the file descriptor instead of the path, and simplifies finalize_entry and its callers (and tests) accordingly.

By "almost ready," I mean that the fix is done and passes tests, and I am already working on the PR description. I expect to have that open in ten minutes or so. It is a fix only for this, not for any other issues. Or, rather, not for any I have opened as of now or have seen, or that I am otherwise certain of. (For a possible further issue my forthcoming PR may fix, see below.)

I think it somewhat improves behavior in race conditions, for the general reasons about race conditions described in this issue. But my main motivations with it are to simplify things, make operations more consistent, and point the way toward some other possible future changes, for example, by making clear which uses of paths are really conceptually operating on paths.

I'm not sure exactly what you have planned, but my guess is that this may be consistent with it. (I mean that I guess it will not get in the way of your plan, not that you should be at all reluctant to change or replace any of it at any point. For example, if you eventually eliminate changing permissions on existing files, then that would replace most of the changes. But I think the changes in my PR would either leave that as easy to do before, or make it easier.)

One area that is important to consider--important enough that I'm mentioning it here and not only in my PR description--is that if we open a file through a symlink and then use fchmod to change its mode, then I believe we have always changed the mode on the file the symlink (or chain of symlinks) has dereferenced to. But I think, based on looking at the code, my recollections of your explanations of it, and recent (though admittedly nonexhaustive) manual testing, that the File object we are working with will have been opened in a way that does not open a file through a symlink.

Nonetheless, I think this is something that should be examined in review before my forthcoming PR (or anything changing this) is merged. Assuming I my belief that the File object is known not to represent a file that was reached by dereferencing a symlink is correct, the change should actually improve resistance to symlink-related bugs, because if we know we have opened the right file, then we know we are operating on that file through its file descriptor (i.e., even if the file has moved, it is affected wherever it is, and whatever else is where it was is not affected).

More specifically, the reason I suspect an improvement in that area is that the current code (both before and after #1764) uses std::fs::set_permissions to write the mode. But if I understand correctly, std::fs::set_permissions calls chmod and not lchmod. So it seems like there would currently be TOCTOU scenarios where, due to a race of the checkout against directory modifications performed by code from outside gitoxide, we would set permissions on a symlink that has replaced the file, in a way that actually sets permissions on whatever the symlink points to. (Though the user would still have to have access to modify it.)

@EliahKagan
Copy link
Member Author

I've opened #1803 for this.

@Byron
Copy link
Member

Byron commented Jan 25, 2025

Thanks for sharing!

I'm not sure exactly what you have planned, but my guess is that this may be consistent with it. (I mean that I guess it will not get in the way of your plan, not that you should be at all reluctant to change or replace any of it at any point. For example, if you eventually eliminate changing permissions on existing files, then that would replace most of the changes. But I think the changes in my PR would either leave that as easy to do before, or make it easier.)

Please feel free to make these fixes as you think they should be done, my plans for reset don't go further than "want to get this done this year".

But I think, based on looking at the code, my recollections of your explanations of it, and recent (though admittedly nonexhaustive) manual testing, that the File object we are working with will have been opened in a way that does not open a file through a symlink.

This is correct, Git and gix generally don't open files through symlinks.
And I agree, this change will then make the code considerably more robust as well.

More specifically, the reason I suspect an improvement in that area is that the current code (both before and after #1764) uses std::fs::set_permissions to write the mode. But if I understand correctly, std::fs::set_permissions calls chmod and not lchmod. So it seems like there would currently be TOCTOU scenarios where, due to a race of the checkout against directory modifications performed by code from outside gitoxide, we would set permissions on a symlink that has replaced the file, in a way that actually sets permissions on whatever the symlink points to. (Though the user would still have to have access to modify it.)

Yes, I think this would be another bug and/or oversight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants