Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't revalidate Python interpreter cache entry with --upgrade #10361

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

konstin
Copy link
Member

@konstin konstin commented Jan 7, 2025

We were always revalidating the Python interpreter cache entry when using --upgrade, since the cache timestamp is too recent. Instead, we ignore the usual caching semantics for packages and only compare the timestamp of the interpreter with the recorded timestamp. This avoids the cost for querying the Python interpreter, at the expense of being slightly inconsistent with Cache behaving different for python alone (which is imho acceptable since interpreter metadata is already different from package metadata).

Most of the diff is an indentation change.

We were always revalidating the Python cache entry when using `--upgrade`, since the cache timestamp is too recent. Instead, we ignore the usual caching semantics for packages and only compare the timestamp of the interpreter with the recorded timestamp. This avoids the cost for querying the Python interpreter, and the expense of being slightly inconsistent with `Cache` behaving different for python alone (which is imho acceptable since interpreter metadata is already different from package metadata).
@konstin konstin added the performance Potential performance improvement label Jan 7, 2025
@konstin konstin requested a review from charliermarsh January 7, 2025 13:52
}

// We ignore the freshness of the cache entry and use only our own logic, otherwise
// `--upgrade` always implies querying Python.
Copy link
Member

@charliermarsh charliermarsh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to respect --refresh though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is --refresh intended to also revalidate the python interpreter querying? Querying Python is kinda costly, so I'd like to find a way to not do it unless we have, e.g. not query python with just --upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it's the only way to revalidate it, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the internal timestamp check, so when the file changes we revalidate. It's different from packages because python is local, so we're always performing the "revalidation request" by checking the timestamp on the file. Are there scenarios where the python interpreter metadata changes while the timestamp would indicate it's fresh?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... guess not. I mean, for symlinks yes, but we already don't cache for symlinks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not avoid revalidation with --refresh, since we're also revalidating the python interpreter by checking the timestamp. It does however not update if the python interpreter changes without the timestamp changing (i don't know how that would be possible except for the user manually changing timestamps, breaking our caching assumptions). This is mirroring the behavior for packages, where we're also trusting the remote with If-Modified-Since and only reload the full information with --refresh when either the revalidation request fails or the user clears the cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. On main, --refresh will force us to re-query here always. On this branch, --refresh will no longer re-query here unless the timestamp has changed. So it does avoid refetching the interpreter data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the interpreter depend on something else that changes?

This is mirroring the behavior for packages, where we're also trusting the remote with If-Modified-Since and only reload the full information with --refresh when either the revalidation request fails or the user clears the cache.

I think this is a reasonable argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the interpreter depend on something else that changes?

If something like that exists, it would break this PR and i'd change strategies to more aggressively invalidating the python interpreter cache instead. I'm not aware of any such cases, if you have any let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a great example #10832

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants