Potential additions to the proposed TextCluster API #11141

AndresRPerez12 · 2025-03-18T15:41:49Z

What is the issue with the HTML Standard?

The possibility to create and render TextCluster objects has been discussed in this issue, and the spec PR is currently being reviewed.

After some discussions around potential use cases for these new capabilities, a couple of interesting points came up around how to make efficient use of the new API and what would be needed to improve its usefulness:

Adding some kind of unique id or hash to each TextCluster, indicating that the underlying glyph (or glyphs) are the same. This would enable the creation of an atlas to reuse the clusters as textures.
Exposing bidi level information for each cluster.

We wanted to see what opinions people here have on adding these as read-only attributes to the TextCluster interface. Were not sure what would be the best way to spec a unique id like that, as it doesn't really need to be consistent across platforms, it just has to be unique within each user agent. I tried to look around in the spec for other examples of hashing or generating unique ids but I wasn't able to find a reference.

cc: @whatwg/canvas

The text was updated successfully, but these errors were encountered:

fserb · 2025-03-18T21:38:39Z

Just to clarify. The use case is: how can the developer know that a particular TextCluster matches (rendering-wise) another TextCluster, for the purposes of caching (either calculations or a rendered version of the cluster, for example).

This would be impossible with the current API, as one would need to use the font, the text and the character interval as key hashes, i.e., you could only cache a TextCluster for the same position and the same word. Ideally, users should be able to know "this TextCluster will render identically to this other TextCluster - of the same or another word" and ideally they could do this in a way that can be extended to Maps.

It's not very clear from the use cases we collected if matching those across font-size matters. I'd assume no and assume that we want to match TextClusters if they are the same "resolved font" (with size) and list of glyphs. As Andres mentioned, one way to do this would be to expose a local id (or hash) that is unique for resolved-fonts(font + size):glyph list.

But other ideas would be appreciated too.

annevk · 2025-03-18T21:53:15Z

I wonder if a more natural API would be some kind of matches() or equals() method.

AndresRPerez12 · 2025-03-20T16:15:23Z

I wonder if a more natural API would be some kind of matches() or equals() method.

In the scenario of building a cache, I think this would require going over all existing clusters that match the substring. I feel it would still be preferable to expose some value that can directly be used as a map key for efficiency.

But I do feel a methods like matches() or equals() can be useful too.

fserb · 2025-03-20T17:04:06Z

@annevk would something like this work for Maps or Sets?

domenic · 2025-03-21T03:54:30Z

Sadly no; it is not current possible to customize the equality test used by Maps and Sets. (And relevant discussions in TC39 seem stalled.)

Another possibility would be to always return the same TextCluster object, basically making the user agent do memoization to hide the hash key. This would be somewhat unusual, but doesn't seem impossible? Have a map<hash_key, weak_ptr<TextCluster>> internally, and before returning a new TextCluster, compute the hash key and look it up in the map. If an entry exists, then return that TextCluster instead of creating a new one.

Is that feasible in real-world implementations?

I guess maybe it would require making TextClusters immutable, whereas right now they have mutable x and y properties.

annevk · 2025-03-25T13:03:48Z

That kind of internal map is certainly something we have precedent for, e.g., with live collections.

cc @smaug---- @vitorroriz

AndresRPerez12 · 2025-03-31T20:55:28Z

Another possibility would be to always return the same TextCluster object, basically making the user agent do memoization to hide the hash key. This would be somewhat unusual, but doesn't seem impossible? Have a map<hash_key, weak_ptr<TextCluster>> internally, and before returning a new TextCluster, compute the hash key and look it up in the map. If an entry exists, then return that TextCluster instead of creating a new one.

The use cases we have in mind are more towards developers knowing that the TextCluster will render the same, so that they can reuse the rendered result itself (i.e. not having to call fillTextCluster() or strokeTextCluster() again), or reuse calculations they've done based on the cluster before. So I think the hash key would have to be exposed for developers to realize this.

Maybe the idea of memoization could be used to generate a simple unique identifier when the cluster doesn't match an existing one? It could be a simple counter that increases if the cluster is new.

I guess maybe it would require making TextClusters immutable, whereas right now they have mutable x and y properties.

The properties are immutable as currently proposed, but two clusters can be "equivalent" for this scenario and have different x and y values.

domenic · 2025-04-01T02:37:07Z

So I think the hash key would have to be exposed for developers to realize this.

I don't think I follow. If you always return the same TextCluster object, then they could realize this using the === operator.

The properties are immutable as currently proposed

#10677 lists attribute double x, not readonly attribute double x.

AndresRPerez12 · 2025-04-01T18:39:01Z

I don't think I follow. If you always return the same TextCluster object, then they could realize this using the === operator.

The objects might be technically different but equivalent for the use cases we have in mind. For example, for the text "elefant", both instances of the letter e (which are each a cluster in this case) are most likely using the exact same glyph. Their positions are different, so the TextCluster objects will be different when evaluated with ===, but we want to enable developers to know with confidence that the result of rendering these two clusters will be the exact same. So that, for instance, if they are caching the bitmap for each cluster to be used somewhere else, they know it's not needed to call fillTextCluster() and export the content of the canvas for the second e, since they can reuse the result of the first one.

#10677 lists attribute double x, not readonly attribute double x.

We updated the IDL to make it readonly on a reply on the issue, but the main body of the issue wasn't edited, sorry about that. This explainer is fully up to date and synced with the last version of the spec PR: https://github.com/fserb/canvas2D/blob/master/spec/enhanced-textmetrics.md

domenic added the topic: canvas label Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential additions to the proposed TextCluster API #11141

Potential additions to the proposed TextCluster API #11141

AndresRPerez12 commented Mar 18, 2025

fserb commented Mar 18, 2025

annevk commented Mar 18, 2025

AndresRPerez12 commented Mar 20, 2025

fserb commented Mar 20, 2025

domenic commented Mar 21, 2025

annevk commented Mar 25, 2025

AndresRPerez12 commented Mar 31, 2025

domenic commented Apr 1, 2025

AndresRPerez12 commented Apr 1, 2025

Potential additions to the proposed TextCluster API #11141

Potential additions to the proposed TextCluster API #11141

Comments

AndresRPerez12 commented Mar 18, 2025

What is the issue with the HTML Standard?

fserb commented Mar 18, 2025

annevk commented Mar 18, 2025

AndresRPerez12 commented Mar 20, 2025

fserb commented Mar 20, 2025

domenic commented Mar 21, 2025

annevk commented Mar 25, 2025

AndresRPerez12 commented Mar 31, 2025

domenic commented Apr 1, 2025

AndresRPerez12 commented Apr 1, 2025