-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add complexity measure on expressions #2389
Comments
Thanks for starting this proposal Glen, well explained! We can start with a simple complexity rule (like count the number of terms and operations or so), and refine that afterwards where needed. Like you state, what is most difficult to me is to determine how many steps in a certain direction we should try out (very computationally heavy), since it is possible that an intermediate step may increase the complexity, but a simplification step afterwards will simplify very well. Just thinking aloud here: maybe it is possible that we label specific rules to allow temporary increasing the complexity, but not all rules, to prevent an explosion of possibilities that needs to be tried out. It would be nice to collect a set of real-world examples to get a better understanding of complexity and temporary increase of complexity. @ericman314 do you have any thoughts on this matter? |
Interesting, the complexity metric could be used in an A* type algorithm for searching for a simplified expression while still allowing intermediate expressions which are more complex, but would search more preferably toward lower complexity. Also, regarding simplification itself, operations like factoring are hard (it's what makes modern cryptography work, after all). Berlekamp's algorithm is widely used in computer algebra systems but it looks complicated (to me at least). The pattern matching framework might not be able to factor every expression. Sometimes an alternating mixture of expanding and then factoring would also be needed. And @gwhitney is right to say that sometimes factoring results in a simpler expression, and sometimes expanding does. Sometimes it's subjective, which is why some CAS softwares have separate functions for both. It could be more performant to have explicit "expand" and "collect" functions which, rather than using the generic pattern matching routine, operate on a simpler set of rules and perform just one kind of simplification. Then, the simplify routine could try these functions when the pattern matching gets stuck. |
🤔 yes, that makes sense indeed. When doing these things manually your know when doing an expanding step which in the end can be simplified again, so we could add that information to the rules I think. |
I agree polynomial factorization is beyond the scope of the current simplify() and that it's not clear that it belongs there; perhaps factor(), expand(), and collect() should be separate functions. (I don't have the bandwidth at the moment to attack any of these three; if you want (a) placeholder issue(s) for any/all of them, I could file it/them if you like. On the specific topic of this issue, sounds good: I will start with a PR for a leafComplexity function in simplify.utils (once it's exposed per #2282) that just counts the number of leaf nodes of an expression tree. I do think that generally, but not always, people prefer the expression with fewer leaves as "simpler." That does lead to the question: with simplify() as it currently stands, if the output would be of higher complexity, should it change behavior to just return the original input? At the moment I don't necessarily think so, it seems to me that simplify() is canonicalizing reasonably well. |
Sounds good, starting with the simplest possible implementation👍💪 |
I was thinking about the naming here. "leafComplexity" seems a bit cumbersome for simply the number of leaves of an expression tree. One possibility would be to overload |
Interesting ideas. Using |
This function returns the number of leaves in the parse tree of an expression. The motivation is to provide an initial complexity measure that could be used to decide whether or not to apply some simplification rules. Docs, embedded docs, and test cases are provided. Resolves josdejong#2389.
As you can see in the PR, I opted for a regular |
This function returns the number of leaves in the parse tree of an expression. The motivation is to provide an initial complexity measure that could be used to decide whether or not to apply some simplification rules. Docs, embedded docs, and test cases are provided. Resolves #2389.
Yes, I like it, thanks Glen! Looking forward to the next step, actually using the function to do smart things in |
As suggested by @josdejong in #1913, it would be useful to have a function measuring the complexity of mathematical expressions (i.e., Node objects). Besides its possible use for the purposes of simplification, it could potentially also be of benefit in ordering subexpressions, or for clients wanting to process multiple expressions in order of complexity.
In particular for simplification, we could imagine marking a rule as applying only if the eventual result produced by the application is of lower complexity than the eventual result produced by continuing without the rule application. There could even be "bidirectional" rules. A conceivable example of such:
n1*(n2 + n3) <-> n1*n2 + n1*n3
. If either side (or both sides!) of the rule matches, then the matching side is replaced, and simplification proceeds both with the original node and the replacement node(s), with the eventually least complex winning. (I think we must be cautious about combinatorial explosion here in the search space, but it's not possible to experiment until there is a complexity measure.) Some real-ish examples of wanting to go either way in this rule:x*(3x^3 + 5x^2)
would generally be written out as3x^4 + 5x^3
, whereas I thinkx*z + x *y
would usually be simplified tox*(y+z)
. (Note that even the very simplistic complexity measure of "operator count" guides which way to go with this distributive law "correctly" in these two examples.)It seems as though https://www.mdpi.com/2073-8994/13/2/188 provides something of an overview of possible complexity measures with references to a number of papers giving specific details of certain measures. Comments as to which measure(s) of complexity would be most useful for mathjs would be very welcome.
The text was updated successfully, but these errors were encountered: