#Interpretable Machine Learning PySR

# PySR-- A Modern Symbolic Regression Method Member：YuanLong, Chen Previously, we introduced the basics of symbolic regression (SR) and simple examples. In this century, many SR-based methods have been developed and applied across various fields. This paper discusses a new method called PySR, published in 2023 [1]. ## Brief Introduction of PySR PySR is an open-source library for practical symbolic regression, a machine learning method that discovers human-interpretable symbolic models. It was created to popularize symbolic regression in scientific research. PySR employs a multi-population evolutionary algorithm with a unique evolve-simplify-optimize loop, designed to optimize unknown scalar constants in newly discovered empirical expressions. ### The Progression History of SR Symbolic regression (SR) emerged as a scientific tool in the 1970s and 1980s, with early efforts like Bacon [2] and its successors, including Fahrenheit [3], which used heuristic strategies to discover simple physics laws from idealized data. **Genetic algorithms**, offering flexible search spaces with fewer assumptions, gained popularity through works like [4]. Tools such as Eureqa [5-7] and HeuristicLab [8] later made SR practical for real-world, noisy scientific data and empirical equations. ### Discovering An Empirical Equation in Science The requirements for “scientific symbolic regression (SR)” are significantly more challenging than for SR applied to synthetic datasets often used for benchmarking. Key challenges include: * **Insightful Equations**: The discovered equation must provide insight into the problem. Scientists prefer equations that balance simplicity and accuracy rather than prioritizing accuracy alone. * **Noisy Data**: Scientific data often contains noise with a heteroscedastic structure, making discovery more complex. * **Non-Differentiable Expressions**: Empirical equations may include piecewise expressions (e.g., exp$(x)$ for $x < 0$ and $x^2 + 1$ for $x ≥ 0$). * **Physical Constraints**: Discovered equations must satisfy known scientific constraints, such as conservation laws. * **Custom Operators**: Many fields use domain-specific operators, requiring flexibility in the equation search strategy. * **High-Dimensional Data**: Observational datasets are often high-dimensional, necessitating either internal handling or integration with a feature selection algorithm. ## Algorithm of PySR PySR uses a **multi-population evolutionary algorithm** where populations evolve asynchronously. Each population follows a classic evolutionary algorithm based on tournament selection for choosing individuals and employs mutations and crossovers to generate new ones. Evolutionary algorithms for symbolic regression (SR), first popularized in [4], have since become the standard for flexible SR methods. PySR introduces several enhancements to this approach, incorporating insights from recent research. A simple evolutionary algorithm proceeds as follows: 1. Start with a population of individuals, a fitness function, and a set of mutation operators. 2. Randomly select an $n_s$-sized subset of individuals from the population (e.g., classical tournament selection uses $n_s = 2$, but larger tournaments are also allowed). 3. Run the tournament by evaluating the fitness of every individual in that subset. 4. Select the fittest individual as the winner with probability p. Otherwise, remove this individual from the subset and repeat this step again. If there is one remaining, select it. Thus, the probability will roughly be $p$, $p(1 − p)$, $p(1 − p)^2$, ... for the first, second, third fittest, and so on. 5. Create a copy of this selected individual, and apply a randomly-selected mutation from a set of possible mutations. 6. Replace a member of the population with the mutated individual. Here, one would replace the weakest of the population or subset. This algorithm is computationally efficient and highly scalable, enabling massive parallelization. Step 2 can be divided into multiple groups of individuals that **evolve independently**, with asynchronous "migration" of individuals between groups to enhance diversity and maintain global optimization. ### Modification 1. **Age-regularized** The first modification involves replacing the eldest member of the population in Step 5 instead of the one with the lowest fitness. This strategy, known as age-regularized evolution [9], draws inspiration from the remarkable Neural Architecture Search results presented in [9, 10]. By prioritizing age over fitness, this approach mitigates the risks of early convergence and premature specialization, reducing the likelihood of the population getting stuck in local minima. The implementation in [10] uses $p = 1$ and explores $4 \le n_s \le 64$, though here we allow $p$ to be less than 1, among several other differences. 2. **Simulated Annealing** They apply simulated annealing [11] to step 4.: given a temperature $T \in [0, 1]$, a mutation is rejected with some probability related to the change in fitness after mutation. The probability for rejection is $$p = \exp(\frac{LF −LE}{\alpha T})$$, for $LF$ and $LE$ the fitness of the mutated and original individual respectively, and $α$ a hyperparameter. This allows the evolution to alternate between high temperature and low temperature phases. The **high temperature phases increasing diversity of individuals** and the **low temperature phases narrowing in on the fittest individuals**. $\alpha$ controls the scale of temperature, with $\alpha \to \infty$ being equivalent to regular tournament selection, and $\alpha \to 0$ rejecting all mutations which lower fitness. 3. They embed the genetic algorithm inside an evolve-simplify-optimize loop. * **Evolve**:This step is a **repeated application** of tournament selection-based evolution for a set number of mutations. * **Simplify**: This refers to the simplification of equations during the loop using algebraic equivalencies, transforming equations into an equivalent but simpler form. * **Optimize**: This step uses a classical optimization algorithm (defaulting to BFGS [12]) for a few iterations to optimize constants in the equations. The reason for performing several mutations (and crossovers) before simplifying and optimizing is that some equations can only be accessed through a redundant intermediate state. For example, $x \ast x − x \ast x$ would normally simplify to $0$, but if the simplification is avoided, the next mutation might change one of the $x$ to $y$, resulting in $x \ast x − x \ast y$. Simplifying only occasionally is a way of allowing redundant but necessary intermediate expressions while at the same time reducing the size of the search space. The “Optimize” phase greatly improves the discovery of equations with real constants. 4. The final modification involves a novel **adaptive parsimony metric** for managing complexity in expressions. **Complexity**: By default, complexity in PySR is equal to the number of nodes in an expression tree. Traditionally, complexity is penalized using a constant parsimony, which adds a loss term proportional to the complexity of an expression, such as: $$l(E) = l_{pred} (E) + (parsimony) · C(E),$$ for $C(E)$ the complexity of an expression $E$ and $l_{pred}$ the predictive loss. However, PySR adapts the penalty for complexity by tuning it based on the frequency of expressions at each level of complexity. This adaptive penalty is expressed as: $$l(E) = l_{pred}(E) · exp(frequency[C(E)]),$$ where the “frequency” of $C(E)$ is some combined measure of the frequency and recency of expressions occurring at complexity $C(E)$ in the population. This adaptive approach punishes complex expressions based on their frequency in the population, encouraging the evolutionary algorithm to explore both **simpler, less accurate expressions** and **more complex, accurate ones**. ### Pseudocode of PySR * Pseudocode for the outer loop of PySR is given in algorithm 1. ![Algorithm](https://hackmd.io/_uploads/B1QjN4THyg.png =200%x) It can be seen that the loop in lines 10-30 of this algorithm is executed synchronously. The loop in lines 12-17 illustrates the processes of evolution, simplification, and optimization. The loop in lines 20-29 demonstrates the process of migration. * Pseudocode for evolution of PySR is given in algorithm 2. ![Evolution](https://hackmd.io/_uploads/ByeC4V6rJx.png) The loop in lines 2-19 illustrates the general process of evolution that we have described. Line 8 implements our age regularization. It is worth noting that in line 6, we introduce the simulated annealing temperature $𝑇$. It can be observed that with each iteration, as the evolutionary process progresses, $𝑇$ gradually decreases. * Pseudocode for mutation of PySR is given in algorithm 3. ![Mutation](https://hackmd.io/_uploads/B1yeBEpSJx.png) This part demonstrates the process of our mutation operation. It is worth noting that line 8 of the code demonstrates our operation of mutating constants. The magnitude of the disturbance we add to the constant terms is related to our temperature $T$. When the temperature is higher, the disturbance we add can be larger, which in turn increases the magnitude of the mutation. Lines 33-40 of the code demonstrate our validation process for mutations. As can be seen, when the temperature is high, we are more likely to accept unfavorable mutations, while when the temperature approaches $0$, we can only accept favorable mutations. This allows us to explore a larger search space in the early stages of the mutation process, while focusing on finding better results in the later stages. * Pseudocode for tournament of PySR is given in algorithm 4. ![Tournament selection](https://hackmd.io/_uploads/Sk7uBNaHJe.jpg) This part of the algorithm demonstrates the process of conducting a tournament. Lines 12-24 show the process of selecting the highest fitness individuals, which are the ones chosen for the tournament. It is worth noting that in line 18, during the fitness evaluation, we have added a penalty term. As the number of individuals with the same complexity increases in the population, our penalty becomes larger, leading to an increase in $l$ and a corresponding decrease in the likelihood of selection. By adding this adaptive penalty term, we ensure that the number of individuals with different complexities in the population tends to remain balanced. This allows evolution to proceed across individuals of various complexities, rather than simply punishing more complex individuals. ### Additional Features PySR includes a variety of additional features for various types of data and scenarios: * **Noisy data(denoising)** PySR supports multiple strategies for working with noisy data. * **Custom losses** PySR supports arbitrary userdefined loss functions. * **Custom operators** PySR allows custom user-defined unary or binary functions. So long as a function can be defined as either $f : \mathbb{R} → \mathbb{R}$ or $\mathbb{R}^2 → \mathbb{R}$, it can be used as a user-defined operator. * **Feature selection** PySR uses a simple dimensionality reduction preprocessing step. Given a user-defined number of features to select, PySR uses a gradientboosting tree algorithm to first fit the dataset, then select the most important features. * **Constraints** PySR allows various hard constraints to be specified for discovered expressions. These are enforced at every mutation: if a constraint is violated, the mutation is rejected. ## EmpiricalBench Existing SR benchmarks often simplify the datasets compared to the real-world data used to discover well-known empirical equations. To address this, we introduce a new benchmark that aims to more accurately reflect the empirical equation discovery process in science. Every equation in this dataset represents a real empirical expression that scientists have derived from experimental, noisy, and imperfect data. For an algorithm to be considered useful for equation discovery in the sciences, it should be able to identify such relationships. The data for this benchmark is shown in the figure. ![image](https://hackmd.io/_uploads/S1HgEp3S1x.png) They were able to successfully test the algorithms PySR, Operon, DSR, EQL, QLattice, and SR-Transformer on EmpiricalBench. ![image](https://hackmd.io/_uploads/SklGIT3B1g.png) What is also interesting is that the two pure deep learning approaches, EQL and SR-Transformer, recovered the fewest expressions out of all tested algorithms. However, the “untrained” algorithms written using classic heuristics:PySR, Operon, DSR, and QLattice, all out-performed these modern deep learning models. ## Implementing Symbolic Regression with `PySR` ### Installation and Setup Julia and Julia dependencies are installed at first import: ```bash= !pip install -U pysr ``` Then, import the necessary modules: ```python= import pysr import sympy import numpy as np from matplotlib import pyplot as plt from pysr import PySRRegressor from sklearn.model_selection import train_test_split from sklearn.utils import check_random_state ``` ### Data Generation Create a synthetic dataset based on the function $x^2_0 − x^2_1 + x_1 - 1$ ```python= x0 = np.arange(-1, 1, 1/10.) x1 = np.arange(-1, 1, 1/10.) x0, x1 = np.meshgrid(x0, x1) y_truth = x0**2 - x1**2 + x1 - 1 # Visualize the data (optional) ax = plt.figure().add_subplot(projection='3d') ax.set_xlim(-1, 1) ax.set_ylim(-1, 1) surf = ax.plot_surface(x0, x1, y_truth, rstride=1, cstride=1, color='green', alpha=0.5) plt.show() ``` ![image](https://hackmd.io/_uploads/Hk1gl1Zbyl.png) Generate random training and testing sets: ```python= rng = check_random_state(0) x_train = rng.uniform(-1, 1, 100).reshape(50, 2) y_train = x_train[:, 0]**2 - x_train[:, 1]**2 + x_train[:, 1] - 1 x_test = rng.uniform(-1, 1, 100).reshape(50, 2) y_test = x_test[:, 0]**2 - x_test[:, 1]**2 + x_test[:, 1] - 1 ``` ### Training the Symbolic Regressor Now, let's train a `SymbolicRegressor`: We will set up 30 populations of expressions, and use "best" for our model selection strategy: ```python= default_pysr_params = dict( populations=30, model_selection="best", ) ``` We can set the total number of cycles of evolution with `niterations` ```python= model = PySRRegressor( niterations=30, binary_operators=["+","-", "*","/"], **default_pysr_params, ) model.fit(x_train, y_train) ``` ### Visualization and Evaluation We can print the model, which will print out the discovered expressions: ```python= model ``` ``` Hall of Fame: --------------------------------------------------------------------------------------------------- Complexity Loss Score Equation 1 6.824e-01 1.594e+01 y = -1.1413 3 1.954e-01 6.254e-01 y = x₁ - 1.0322 5 1.354e-01 1.832e-01 y = -1.0856 / (1.3452 + x₁) 7 8.644e-02 2.245e-01 y = x₁ + ((x₀ * x₀) + -1.3515) 9 5.552e-02 2.214e-01 y = ((x₀ * x₀) + -1.4978) / (1.401 + x₁) 11 4.917e-15 7.971e+00 y = x₁ + (((x₀ + x₁) * (x₀ - x₁)) + -1) 13 2.689e-15 3.017e-01 y = (-1 - (-1.0477e-09 - x₁)) + ((x₀ + x₁) * (x₀ - x₁)) ------------------------------------------------------------------------------------------------ PySRRegressor.equations_ = [ pick score equation 0 0.000000 -1.1413488 1 0.625373 x1 - 1.0322464 2 0.183162 -1.0855637 / (1.3451785 + x1) 3 0.224539 x1 + ((x0 * x0) + -1.351507) 4 0.221365 ((x0 * x0) + -1.4977801) / (1.4009737 + x1) 5 15.027546 x1 + (((x0 + x1) * (x0 - x1)) + -1.0) 6 >>>> 0.301669 (-1.0 - (-1.0477379e-9 - x1)) + ((x0 + x1) * (... loss complexity 0 6.823915e-01 1 1 1.953626e-01 3 2 1.354406e-01 5 3 8.644047e-02 7 4 5.551901e-02 9 5 4.916780e-15 11 6 2.689395e-15 13 ] ``` We can also print the SymPy format of the best expression: ```python= model.sympy() ``` $\displaystyle \left(x_{0} - x_{1}\right) \left(x_{0} + x_{1}\right) - \left(- x_{1} - 1.0477379 \cdot 10^{-9}\right) - 1.0$ Using model_selection="best"selects the equation with the max score and prints it. But in practice it is best to look through all the equations manually. Here, the fifth expression is original function, but `model.sympy` does not show it. ```python= model.sympy(5) #print express of index 5 ``` $\displaystyle x_{1} + \left(x_{0} - x_{1}\right) \left(x_{0} + x_{1}\right) - 1.0$ We can also use `model.predict` for arbitrary equations, with the default equation being the one chosen by `model_selection`: ```python= ypredict = model.predict(x_test) ypredict_simpler = model.predict(x_test, 5) print("Default selection MSE:", np.power(ypredict - y_test, 2).mean()) print("Manual selection MSE for index 5:", np.power(ypredict_simpler - y_test, 2).mean()) ``` ``` Default selection MSE: 1.097754699641577e-18 Manual selection MSE for index 5: 3.944304526105059e-33 ``` We can also custom operators: ```python= model1 = PySRRegressor( niterations=5, populations=40, binary_operators=["+","-", "*","/"], unary_operators=["square(x) = x^2"], extra_sympy_mappings={"square": lambda x: x*2}, ) model1.fit(x_train, y_train) print(model) model1.sympy() ``` ``` PySRRegressor.equations_ = [ pick score equation 0 0.000000 square(x0) 1 2.619559 x1 - 1.0322441 2 0.042683 (x1 * 1.21694) - 1.0085877 3 0.730035 -1.3515522 + (square(x0) + x1) 4 0.051751 -1.2353518 + (square(square(x0)) + x1) 5 0.060041 -1.3810112 + ((square(x0) + x1) / 0.87710917) 6 10.251028 (square(x0) + -0.99834764) - (square(x1) - x1) 7 >>>> 19.840985 ((x0 * x0) + -1.0) - (square(x1) - x1) 8 0.131010 (((x0 * x0) + -1.0222284) - -0.022228397) - (s... loss complexity 0 2.682264e+00 2 1 1.953626e-01 3 2 1.793772e-01 5 3 8.644046e-02 6 4 8.208089e-02 7 5 7.729772e-02 8 6 2.730246e-06 9 7 6.597380e-15 10 8 5.076652e-15 12 ] ``` $\displaystyle x_{0} x_{0} - \left(x_{1}^{2} - x_{1}\right) - 1.0$ By effectively utilizing these functionalities, we can leverage `PySR` to uncover meaningful mathematical relationships hidden within big data. ## Conclusion In this note, we introduced PySR, an open-source library for practical symbolic regression. By utilizing a powerful multi-population evolutionary algorithm, a unique evolve-simplify-optimize loop, , and improvements to the classic genetic programming algorithm, PySR is capable of accelerating the discovery of interpretable symbolic models from data. Furthermore, through the introduction of a new benchmark, EmpiricalBench, it provide a means to quantitatively evaluate the performance of symbolic regression algorithms in scientific applications. Results indicate that, despite advances in deep learning, classic algorithms build on evolution and other untrained symbolic techniques such as PySR, Operon, DSR, and QLattice, still outperform pure deep-learning-based approaches, EQL and SRTransformer, in discovering historical empirical equations from original and synthetic datasets. Finally, we shows how to program using the PySR library. ## References [1] Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. arXiv preprint arXiv:2305.01582. [2] Langley, P. (1977). BACON: A production system that discovers empirical laws. In Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI 1977). [3] Langley, P., & Zytkow, J. M. (1989). Data-driven approaches to empirical discovery. Artificial Intelligence, 40(1), 283–312. [4] Koza, J. R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2), 87–112. [5] Bongard, J., & Lipson, H. (2007). From the Cover: Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104(24), 9943–9948*. [6] Schmidt, M., & Lipson, H. (2009). Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923), 81–85*. [7] Schmidt, M., & Lipson, H. (2010). Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII (pp. 73–85). [8] Wagner, S., & Affenzeller, M. (2005). HeuristicLab: A generic and extensible optimization environment. In B. Ribeiro, R. F. Albrecht, A. Dobnikar, D. W. Pearson, & N. C. Steele (Eds.), Adaptive and Natural Computing Algorithms (pp. 538–541). Springer. [9] Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized Evolution for Image Classifier Architecture Search. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4780–4789*. [10] Real, E., Liang, C., So, D., & Le, Q. (2020). AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. In International Conference on Machine Learning (pp. 8007–8019). PMLR. [11] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671–680*. [12] Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90*.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.