Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential additions to the proposed TextCluster API #11141

Open
AndresRPerez12 opened this issue Mar 18, 2025 · 9 comments
Open

Potential additions to the proposed TextCluster API #11141

AndresRPerez12 opened this issue Mar 18, 2025 · 9 comments

Comments

@AndresRPerez12
Copy link

What is the issue with the HTML Standard?

The possibility to create and render TextCluster objects has been discussed in this issue, and the spec PR is currently being reviewed.

After some discussions around potential use cases for these new capabilities, a couple of interesting points came up around how to make efficient use of the new API and what would be needed to improve its usefulness:

  • Adding some kind of unique id or hash to each TextCluster, indicating that the underlying glyph (or glyphs) are the same. This would enable the creation of an atlas to reuse the clusters as textures.
  • Exposing bidi level information for each cluster.

We wanted to see what opinions people here have on adding these as read-only attributes to the TextCluster interface. Were not sure what would be the best way to spec a unique id like that, as it doesn't really need to be consistent across platforms, it just has to be unique within each user agent. I tried to look around in the spec for other examples of hashing or generating unique ids but I wasn't able to find a reference.

cc: @whatwg/canvas

@fserb
Copy link
Contributor

fserb commented Mar 18, 2025

Just to clarify. The use case is: how can the developer know that a particular TextCluster matches (rendering-wise) another TextCluster, for the purposes of caching (either calculations or a rendered version of the cluster, for example).

This would be impossible with the current API, as one would need to use the font, the text and the character interval as key hashes, i.e., you could only cache a TextCluster for the same position and the same word. Ideally, users should be able to know "this TextCluster will render identically to this other TextCluster - of the same or another word" and ideally they could do this in a way that can be extended to Maps.

It's not very clear from the use cases we collected if matching those across font-size matters. I'd assume no and assume that we want to match TextClusters if they are the same "resolved font" (with size) and list of glyphs. As Andres mentioned, one way to do this would be to expose a local id (or hash) that is unique for resolved-fonts(font + size):glyph list.

But other ideas would be appreciated too.

@annevk
Copy link
Member

annevk commented Mar 18, 2025

I wonder if a more natural API would be some kind of matches() or equals() method.

@AndresRPerez12
Copy link
Author

I wonder if a more natural API would be some kind of matches() or equals() method.

In the scenario of building a cache, I think this would require going over all existing clusters that match the substring. I feel it would still be preferable to expose some value that can directly be used as a map key for efficiency.

But I do feel a methods like matches() or equals() can be useful too.

@fserb
Copy link
Contributor

fserb commented Mar 20, 2025

@annevk would something like this work for Maps or Sets?

@domenic
Copy link
Member

domenic commented Mar 21, 2025

Sadly no; it is not current possible to customize the equality test used by Maps and Sets. (And relevant discussions in TC39 seem stalled.)

Another possibility would be to always return the same TextCluster object, basically making the user agent do memoization to hide the hash key. This would be somewhat unusual, but doesn't seem impossible? Have a map<hash_key, weak_ptr<TextCluster>> internally, and before returning a new TextCluster, compute the hash key and look it up in the map. If an entry exists, then return that TextCluster instead of creating a new one.

Is that feasible in real-world implementations?

I guess maybe it would require making TextClusters immutable, whereas right now they have mutable x and y properties.

@annevk
Copy link
Member

annevk commented Mar 25, 2025

That kind of internal map is certainly something we have precedent for, e.g., with live collections.

cc @smaug---- @vitorroriz

@AndresRPerez12
Copy link
Author

Another possibility would be to always return the same TextCluster object, basically making the user agent do memoization to hide the hash key. This would be somewhat unusual, but doesn't seem impossible? Have a map<hash_key, weak_ptr<TextCluster>> internally, and before returning a new TextCluster, compute the hash key and look it up in the map. If an entry exists, then return that TextCluster instead of creating a new one.

The use cases we have in mind are more towards developers knowing that the TextCluster will render the same, so that they can reuse the rendered result itself (i.e. not having to call fillTextCluster() or strokeTextCluster() again), or reuse calculations they've done based on the cluster before. So I think the hash key would have to be exposed for developers to realize this.

Maybe the idea of memoization could be used to generate a simple unique identifier when the cluster doesn't match an existing one? It could be a simple counter that increases if the cluster is new.

I guess maybe it would require making TextClusters immutable, whereas right now they have mutable x and y properties.

The properties are immutable as currently proposed, but two clusters can be "equivalent" for this scenario and have different x and y values.

@domenic
Copy link
Member

domenic commented Apr 1, 2025

So I think the hash key would have to be exposed for developers to realize this.

I don't think I follow. If you always return the same TextCluster object, then they could realize this using the === operator.

The properties are immutable as currently proposed

#10677 lists attribute double x, not readonly attribute double x.

@AndresRPerez12
Copy link
Author

I don't think I follow. If you always return the same TextCluster object, then they could realize this using the === operator.

The objects might be technically different but equivalent for the use cases we have in mind. For example, for the text "elefant", both instances of the letter e (which are each a cluster in this case) are most likely using the exact same glyph. Their positions are different, so the TextCluster objects will be different when evaluated with ===, but we want to enable developers to know with confidence that the result of rendering these two clusters will be the exact same. So that, for instance, if they are caching the bitmap for each cluster to be used somewhere else, they know it's not needed to call fillTextCluster() and export the content of the canvas for the second e, since they can reuse the result of the first one.

#10677 lists attribute double x, not readonly attribute double x.

We updated the IDL to make it readonly on a reply on the issue, but the main body of the issue wasn't edited, sorry about that. This explainer is fully up to date and synced with the last version of the spec PR: https://github.com/fserb/canvas2D/blob/master/spec/enhanced-textmetrics.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants