-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Alternative method to set PySRRegressor parameters #216
base: master
Are you sure you want to change the base?
Conversation
This sounds like a good idea to me. Another option here is to define intermediate helper objects: @dataclass
class Sizes:
niterations: int
populations: int
@dataclass
class Space:
binary_operations: Sequence[Callable]
unary_operations: Sequence[Callable]
maxsize: int
class PySRRegressor:
def __init__(self, sizes: Sizes, space: Space, ...) Which can likewise help to split things up by grouping less-frequently-used parameters. This is what I do for |
Thanks for the ideas! I am on the fence about which way to go: passing a hierarchy of objects, or declaratively setting parameters. One one hand, I think that if a user wants to figure out the API just by using hover-help and tab completion in their IDE, then the functional method is better, and the typing method would require they keep flipping back to the docs page, and importing various objects. But on the other hand, the typing method seems more readable. I can definitely see the advantage of a hierarchy of types when only certain parameter combinations are compatible with eachother. And it's also more modular in case a user wants to add their own configuration, or share optimal parameter configs. (Though perhaps this would be better suited for multiple-dispatch) Maybe you could rule out certain param combos with the declarative method too, and have the functions return different subclasses of the main regressor object. Sort of like how |
I don't have any other ideas, but personally, I think the hierarchy of types is more straightforward. It avoids the duplicated docstrings, and I think is easier to understand/follow if the user hasn't read the docs beforehand. Whichever option you decide, I think it will be a nice improvement! |
Thanks for the feedback @tttc3! I’m also wondering if it might be possible to get tab completion to work by exposing a “config” module that acts to store references to each type. Then you could write, for example, Maybe I should just mock up a few designs and conduct a Twitter poll… |
I've not tried using a config namespace to hold them - but otherwise yup, tab completion works well. |
Thanks! How do you use tab completion? i.e., what string of text are you actually hitting |
Typically:
As I have it set to prompt after two characters. (FWIW my current editors are Helix and NvChad.) If you're thinking in terms of discoverability then that should work just the same, I think. Probably easier if you configure prompting to happen after zero characters though. |
Thanks. I guess if I do I wonder if it would be enough to have a prefix on all config parameters, like |
PySRRegressor currently has 74 parameters. That is way too many for a single method. It can be hard to figure out what a single parameter from the long list actually does, especially for new users - who are likely overwhelmed by the large number of options.
What this work-in-progress PR does is create an alternative way to initialize a
PySRRegressor
object. Rather than passing all parameters at initialization (which would still be allowed), you could instead call a series of different functions to configure various parts of the regressor.Here's an example of what this might look like:
Each of these functions would define only a subset of parameters according to the groups shown in the new API page: https://astroautomata.com/PySR/api/.
This makes for much more readable code, and hopefully a model that is easier for new users to explore, rather than having all parameters dumped in a flat list. It also presents an opportunity for better documentation: these
set_...
functions can have more specific explanations of, e.g., what an "equation space" is (forset_space
), rather than giving a general overview in the__init__
function.There are a couple issues with this:
__init__
and at theset_...
functions. Or, we could remove the docstring from__init__
altogether, and simply point the user to the variousset_...
functions they can use.None
. How do we also make parameters optional in theset_...
functions? My solution to this was to define a globalNONE="none"
constant that is sort of like an alternative definition ofNone
, so we can differentiate between the user setting a variable to None or not.Thanks @dfm for the idea! Would be interested in hearing your comments when you have time.
@patrick-kidger @tttc3 @JayWadekar I am eager to hear what you think as well.