Skip to content

v1.0.0

Compare
Choose a tag to compare
@MilesCranmer MilesCranmer released this 01 Dec 00:07
· 56 commits to master since this release
66ea60c

PySR v1.0.0 Release Notes

PySR 1.0.0 introduces new features for imposing specific functional forms and finding parametric expressions. It also includes TensorBoard support, along with significant updates to the core algorithm, including some important bug fixes. The default hyperparameters have also been updated based on extensive tuning, with a maxsize of 30 rather than 20.

Major New Features

Expression Specifications

PySR 1.0.0 introduces new ways to specify the structure of equations through "Expression Specifications", that expose the new backend feature of AbstractExpression:

Template Expressions

TemplateExpressionSpec allows you to define a specific structure for your equations. For example:

expression_spec = TemplateExpressionSpec(["f", "g"], "((; f, g), (x1, x2, x3)) -> sin(f(x1, x2)) + g(x3)")

Parametric Expressions

ParametricExpressionSpec enables fitting expressions that can adapt to different categories of data with per-category parameters:

expression_spec = ParametricExpressionSpec(max_parameters=2)
model = PySRRegressor(
    expression_spec=expression_spec
    binary_operators=["+", "*", "-", "/"],
)
model.fit(X, y, category=category)  # Pass category labels

Improved Logging with TensorBoard

The new TensorBoardLoggerSpec enables logging of the search process, as well as hyperparameter recording, which exposes the AbstractSRLogger feature of the backend:

logger_spec = TensorBoardLoggerSpec(
    log_dir="logs/run",
    log_interval=10,  # Log every 10 iterations
)
model = PySRRegressor(logger_spec=logger_spec)

Features logged include:

  • Loss curves over time at each complexity level
  • Population statistics
  • Pareto "volume" logging (measures performance over all complexities with a single scalar)
  • The min loss over time

Algorithm Improvements

Updated Default Parameters

The default hyperparameters have been significantly revised based on testing:

  • Increased default maxsize from 20 to 30, as I noticed that many people use the defaults, and this maxsize would allow for more accurate expressions.
  • New mutation operator weights optimized for better performance, along the new mutation "rotate tree."
  • Improved search parameters tuned using Pareto front volume calculations.
  • Default niterations increased from 40 to 100, also to support better accuracy (at the expense of slightly longer default search times).

Core Changes

  • New output organization: Results are now stored in outputs/<run_id>/ rather than in the directory of execution.
  • Improved performance with better parallelism handling
  • Support for Python 3.10+
  • Updated Julia backend to version 1.10+
  • Fix for aliasing issues in crossover operations

Breaking Changes

  • Minimum Python version is now 3.10, and minimum Julia version is 1.10
  • Output file structure has changed to use directories
  • Parameter name updates:
    • equation_fileoutput_directory + run_id
    • Added clearer naming for parallelism options, such as parallelism="serial" rather than the old multithreading=False, procs=0 which was unclear

Documentation

The documentation has a new home at https://ai.damtp.cam.ac.uk/pysr/