Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of unified extension algorithm for "no invalidation" patches #232

Open
skef opened this issue Nov 8, 2024 · 2 comments
Open

Comments

@skef
Copy link
Contributor

skef commented Nov 8, 2024

It still seems to me that the patch extension algorithm is designed around the needs of what we're now calling "table-keyed"/invalidating or partially invalidating patches, with non-invalidating patches added on. This issue (to the extent it is one) is related to my questions about the unified algorithm but not equivalent to it -- perhaps a different unified algorithm would have a different balance of focus.

In general we expect to have a smaller choice of table-keyed patches at a given level, although strategies to provide a "one round trip" solution with patches of different grains might increase the number. There could be quite a few glyph-keyed patches depending on the degree to which the encoding is trying to squeeze the transfer size to a minimum. And if you have a large file you could be loading quite a few glyph-keyed patches.

With that in mind, this is the selection description for no invalidation patches:

Otherwise select exactly one of the No Invalidation entries in entry list. The criteria for selecting the single entry is left up to the implementation to decide.

Later language clarifies that patch loads for such patches can be done concurrently.

As stated, even if the selection is "left up to the implementation" (something we'll probably need to change) the process is specified to be in relation to the current patch subset definition, which is updated on each round. So as specified the update algorithm compares $\frac{(m+1)(2n-m)}{2}$ times, where $n$ is the number of patches in the list and $m$ is the number of patches that need to be loaded (fourth edit's the charm?). That's not quite $O(n^2)$ but it's in the ballpark.

There may be some corner cases with glyph-keyed patches where it would make sense to pick one over another. The main case I can think of is if you have a glyph duplicated among several patches that is itself directly associated with a codepoint, and you've thrown that codepoint into the list for each patch. In such cases you might be able to avoid loading a patch by being careful about the analysis. However, those cases will be quite rare and aren't important to optimize for. So I would be much more inclined to just treat Non-invalidating patches as truly independent during selection.

@garretrieger
Copy link
Contributor

garretrieger commented Nov 9, 2024

I'm currently working through a client side implementation of the extension algorithm so I can provide some perspective on how that looks in practice. Here's what my current implementation does:

  1. locate all intersecting patches as per the procedure in the spec.
  2. In one linear scan select a maximal set of patches which don't invalidate each other. That could be
    • One fully invalidating patch,
    • One partial invalidating patch + zero or more non invalidating with a different compat id then the partially invalidating
      one.
    • or up to two partially invalidating patches if they have different compat ids.
  3. Initiate loads for all selected patches.
  4. Apply them in the order required by the specification. Note: there's a corner case here where one of the partial invalidating patches may change the set of non invalidating patches which are listed or introduce new partial invalidation patches. So you do need to recheck the intersection list once more after partial invalidation patch application and drop any of the non invalidating patches that are no longer in the intersection.
  5. Repeat as needed until nothing outstanding remains.

Done this way that avoids the O(n^2) performance issue you mentioned. At minimum we should update the note about concurrent loading to also explicitly note the O(n^2) issue and encourage concurrent selection of patches as well.

The other option I see would be to re-write the extension algorithm closer to the procedure I described above where on each iteration you select a maximal set of patches, which are all applied before rechecking for new intersections. The draw back is that in the corner case I described above it would result in different behaviour. Any non invalidating patches which are no longer are listed would still get applied. In theory this behaviour should still be valid since we wouldn't be violating any compat id constraints at any point.

There's a couple of benefits to going down this route:

  1. Client side implementation gets a bit simpler since it doesn't have to deal with the corner case.
  2. By explicitly requiring clients to pre-fetch valid looking non-invalidation patches, encoders will expect this to happen and encode things appropriately. Where-as under the current text encoders need to assume reasonable clients will do this even if it's not actually required.

@garretrieger
Copy link
Contributor

garretrieger commented Nov 9, 2024

Actually thinking about it a bit more, we still would need to handle the corner case in the alternate algorithm I proposed since you can also have the situation where multiple partial invalidation patches need to be applied prior to applying any of the non-invalidating ones. For example these might be necessary resize glyf/loca prior to the non-invalidating patches being applied. That would make it more complex to describe the algorithm in this fashion, but it might still be worthwhile to try drafting up the alternate version to get a sense for what it looks like in practice and then decide which approach looks to be more understandable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants