Opened 5 years ago

Closed 3 years ago

Last modified 3 years ago

#16221 closed bug (fixed)

Web+: Garbled text on Github

Reported by: bitigchi Owned by: pulkomandy
Priority: normal Milestone: R1/beta3
Component: Kits/Web Kit Version: R1/Development
Keywords: Cc:
Blocked By: #16213 Blocking:
Platform: All

Description (last modified by bitigchi)

Haiku WebKit 1.7.0, latest Beta2 build.

See screenshot.

I have encountered the same issue on Wikipedia as well. Probably it happens on other websites too.

Attachments (5)

web+.png (22.1 KB ) - added by bitigchi 5 years ago.
screenshot
settings.png (53.4 KB ) - added by bitigchi 5 years ago.
fonts
Notifications-min.html (282 bytes ) - added by madmax 4 years ago.
Simple test case
notifications.txt (36.5 KB ) - added by madmax 4 years ago.
Tracing from LayoutGlyphs
32BitGlyph.diff (3.5 KB ) - added by pulkomandy 3 years ago.

Download all attachments as: .zip

Change History (40)

by bitigchi, 5 years ago

Attachment: web+.png added

screenshot

comment:1 by bitigchi, 5 years ago

Description: modified (diff)

comment:2 by nephele, 5 years ago

What font do you have set in WebPositive?

I think this is probably the same issue as https://dev.haiku-os.org/ticket/16213

by bitigchi, 5 years ago

Attachment: settings.png added

fonts

in reply to:  2 comment:3 by bitigchi, 5 years ago

Replying to nephele:

What font do you have set in WebPositive?

Please see screenshot.

comment:4 by pulkomandy, 5 years ago

similar problems have been around for a long time. WebKit has two codepath for "simple" and "complex" text, the latter being used in various cases: actual "complex" scripts needing ligatures, but also depending on css rules.

Normally it is implemented by using freetype and harfbuzz directly, but in our case we do the rendering on app_server side, so we try to fake it by calling back into the "simple" rendering codepath, and it fails.

in reply to:  4 comment:5 by X512, 5 years ago

Replying to pulkomandy:

Normally it is implemented by using freetype and harfbuzz directly, but in our case we do the rendering on app_server side, so we try to fake it by calling back into the "simple" rendering codepath, and it fails.

Is it possible to introduce BView::DrawString that will accept glyph and position array produced by harfbuzz?

comment:6 by pulkomandy, 5 years ago

we need complex text rendering in app_server in other places anyway, so I think it makes sense to integrate Harfbuzz on app_server side. I don't know if a drawstring-style API would work, because the glyphs can be very loosely related to the initial utf8 input. For example some "icon fonts" abuse ligatures to convert whole words (such as "keyboard_arrow_up") into pictograms.

comment:7 by Pete, 4 years ago

I also hit this, in Wikipedia, e.g. https://en.wikipedia.org/wiki/List_of_brightest_stars. With the default Noto fonts, al the negative magnitudes are garbled. Changing the Sans Serif font to Bitstream Charter clears up the text. Never saw this in earlier revisions. Trying fontdemo, I don't see the problem mentioned in #16213.

comment:8 by nephele, 4 years ago

What are negative magnitudes? In any case, i copied a sample string that was garbled from that site and reproduced it with the method in the app_server ticket, just be sure to set the font to noto sans (regular), and play around with the size slider and you should be able to see it render wrong there too.

comment:9 by Pete, 4 years ago

Replying to nephele:

What are negative magnitudes?

The page is about stellar magnitudes (:-/) Bright stars are < 0.

I still don't see any garbling in fontdemo (in hrev54154). Interestingly, if I try to copy a piece of text from the page, the highlighted section un-garbles!

comment:10 by nephele, 4 years ago

web+ renders the text in sections, if you highlight a part that then is a different section (or it is split into a different ammount of sections) since it has to render differently, so that works around the underlying bug somewhat, but it isnt a fix :)

comment:11 by nephele, 4 years ago

I can't reproduce this with https://en.wikipedia.org/wiki/List_of_brightest_stars anymore. with Noto Sans as font.

Still an issue?

comment:12 by pulkomandy, 4 years ago

Blocked By: 16213 added
Milestone: UnscheduledR1/beta3
Resolution: duplicate
Status: newclosed

Probably fixed by the rework of the font overlay.

in reply to:  12 comment:13 by madmax, 4 years ago

Replying to pulkomandy:

Probably fixed by the rework of the font overlay.

Not unless we have a font with the emojis used in the github notifications page. Or something, I don't know exactly what the issue is, but WebKit needs something more than what solved the issue in FontDemo.

We seem to have all glyphs in one font or another for the stars page in wikipedia, not so for the github one in the first screenshot. If you get just that part and change one of the emojis for any that we have, that line is rendered correctly, but the others are still garbled. Some time ago I even added logging everywhere I could and all I got was that requests were being made in chunks of consecutive codepoint blocks. Didn't find anything in Web+ and didn't have enough memory to compile WebKit rebased to see if it worked there (not that I would have had what it takes to hack on it or even find places to add a few printfs, anyway).

comment:14 by nephele, 4 years ago

If this is related to emoji this is probably because we don't load that font ever then (e.g noto color emoji) Wikipedia could have been fixed then by the inclusion of noto symbols 2 or by the fallback fixes, hard to tell. for context the font fallback system tries some specific fonts atm, which we defined in a hardcoded list, but this doesn't include a) a path to specify a own list (like css would want) or b) a path that uses all system fonts

comment:15 by pulkomandy, 4 years ago

This ticket is not about missing characters in our fonts but about garbled/incorrectly spaced text. That was a bug in the font fallback code and it hasbeen fixed. You may still get missing characters (there's another ticket for that) but you shouldn't get characters overlapping each other anymore.

There were also changes on webkit side that led to rework of the font rendering and fixed bugs there (the ones I mentionned in earlier comments about having "simple" and "complex" text, the two codepath are now a lot more similar than they used to be)

by madmax, 4 years ago

Attachment: Notifications-min.html added

Simple test case

by madmax, 4 years ago

Attachment: notifications.txt added

Tracing from LayoutGlyphs

comment:16 by nephele, 4 years ago

I reckon this would be fixed if we fixed the "unknown glyph" fallback character not getting selected

comment:17 by madmax, 4 years ago

Wikipedia could have been fixed then by the inclusion of noto symbols 2 or by the fallback fixes, hard to tell.

We have a history of changes, so not really that hard :-) Not that it matters, though. Both are useful are we have them, yay!

I used the list of brightest stars page specifically (among others) while working in the fallback code. There were some issues there, among which some ways to confuse the caller and produce garbled text. At first I thought they were the root of all garbled text issues, but I was wrong. The fix corrected the rendering of that and other pages, but not the github one. I knowingly removed this ticket from the "fixes" line in that patch.

I've now logged some data from LayoutGlyphs for the calls with "this" (included in all text blocks in the test) or a codepoint higher than 0x10000 in the string. I think these are the interesting bits:

  • Strings with blocks of sequential codepoints for emojis. Given the next points, it probably happens also for the rest of characters, but I haven't captured those with my filter. I (wildly) guess that's to get info for the app to do its own positioning.
  • The strings with text don't contain the symbols we don't have. Whether this is because we don't have them or because of their unicode block I may check later, the one we have in the test is below 0x10000.
  • The strings with text already come with offset info, and that includes the items for the symbols we don't get. I've tested adding 1 to the offset index after the place were the symbol would have been and the text is rendered correctly (without the symbol, of course, but not garbled).

comment:18 by pulkomandy, 4 years ago

Resolution: duplicate
Status: closedreopened

comment:19 by madmax, 4 years ago

Tried again with a dingbat without glyph (0x278e) and a symbol with glyph (0x1f30d). And now we have two separate problems.

For the dingbat the text is garbled. Again we get a string without the character but offsets for all of them.

For the symbol the line is not garbled, but the symbol is not shown. We are getting it in the string but with the plane nixed: the UTF8 for 0xf30d instead of the one for 0x1f30d.

comment:20 by madmax, 4 years ago

We are getting it in the string but with the plane nixed: the UTF8 for 0xf30d instead of the one for 0x1f30d.

This one looks like a confusion between codepoints and glyph indexes. Glyph is unsigned short, I guess it's more for a glyph index inside a font and not a codepoint. In that case code like this or this would be wrong. Maybe widening Glyph and considering it equal to a codepoint would solve that or is it too bold a hack and would need the review of lots of webkit core code? (That is, if we don't have something in BFont or wherever that would give the required data)

comment:21 by bruno, 4 years ago

Is it solved now? I have no problems with standard Haiku fonts to get the page (GitHub) rendered ok!

comment:22 by humdinger, 4 years ago

Still there. For example at https://github.com/notifications (also with a current HaikuLauncher from the "haiku" webkit branch).

comment:23 by pulkomandy, 4 years ago

Component: Applications/WebPositiveKits/Web Kit

comment:24 by pulkomandy, 3 years ago

So, I'm confused by this font fallback code (again).

I added "Noto Emoji" at the end of the fallback list. When I did this, the garbled text stopped happening. But the emojis didn't show up. It does not matter if I install the emoji font or not.

This is not how this code should be working, but it is not the first time I misunderstand something about it.

I also still have no idea why the "missing glyph" char is not working, as mentioned in one of the previous comments, that would likely fix the problem.

in reply to:  24 comment:25 by madmax, 3 years ago

Replying to pulkomandy:

But the emojis didn't show up. It does not matter if I install the emoji font or not.

The problem of not showing the emojis (when we have the glyph) is not in the fallback code, but in WebKit. The way I read it (keep in mind I was not able to compile WebKit when I last looked into this to test what I'm saying, and it was the old, not rebased branch), it would seem WebKit's Glyph is "font indexed glyph", not character codepoint. It is 16 bits wide. We identity-pair them with the value of the character, the 32 bit codepoint, as if they were the same (see GlyphPageTreeNodeHaiku or FontHaiku, for example). So the hand in the "Mentioned" line works, as it is 270b. The others: 1f3af, 1f4ac, 1f64c and 1f440, don't work because they are in plane 1 and that bit is lost before giving the string to Haiku.

In fact, I'm now (yeah, a year later) thinking the garbling problem may also be there. We get offsets for all the characters, but the string doesn't contain the ones for which a glyph is not available, which looks very strange as we seem to be building both at the same time. But when WebKit asks for a glyph we don't have we set it to 0 in GlyphPageTreeNodeHaiku. Then in FontHaiku instead of drawing the glyphs with the given advances, as it seems what the WebKit infrastructure expects, we rebuild the string to use Haiku API to draw it, and in doing so we are not adding those characters to the BString, as they are char 0 and Append see a 0-length string.

I also still have no idea why the "missing glyph" char is not working, as mentioned in one of the previous comments, that would likely fix the problem.

If I'm right in my previous wall of text, getting the missing glyph char back would indeed solve those issues by making BFont::GetHasGlyphs always return true, but that's exactly why it was removed: https://git.haiku-os.org/haiku/commit/?id=15325401ceeffbac7348f8b7ac518d0d47a2efc2.

It is also possible that always calling setGlyphForIndex(i, character); in GlyphPageTreeNodeHaiku.cpp, whether we have a glyph or not, also solves it for good. And that patching the definition of WebKit's Glyph to a 32 bit int, dirty hack as it may be, would even show the emojis if we have them. But I don't know how bad an idea that is or the consequences it may have in the rest of the code.

comment:26 by pulkomandy, 3 years ago

We identity-pair them with the value of the character, the 32 bit codepoint, as if they were the same (see ​GlyphPageTreeNodeHaiku or ​FontHaiku, for example). So the hand in the "Mentioned" line works, as it is 270b. The others: 1f3af, 1f4ac, 1f64c and 1f440, don't work because they are in plane 1 and that bit is lost before giving the string to Haiku.

What else could we do? There is no notion of "planes" anywhere in WebKit, right? So where should that come from? At which point does webkit remove that information, and where does it store it instead?

Then in FontHaiku instead of drawing the glyphs with the given advances, as it seems what the WebKit infrastructure expects, we rebuild the string to use Haiku API to draw it

This is not an "instead". We do use the advances, but we put the glyphs back into a string. This is because we want to send them to app_server in one single call. Drawing character-by-character would be extremely slow. In fact, the call to DrawString with offsets/advances was added specifically to improve performance in WebKit text drawing.

If I'm right in my previous wall of text, getting the missing glyph char back would indeed solve those issues by making BFont::GetHasGlyphs always return true, but that's exactly why it was removed

We want the missing glyph to be drawn if you put a non-existing character in a string passed to DrawString. Yet, we want GetHasGlyphs to return false in that case, so that applications can check for it and avoid calling DrawString with non-existing glyphs if they care.

And that patching the definition of WebKit's Glyph to a 32 bit int, dirty hack as it may be, would even show the emojis if we have them. But I don't know how bad an idea that is or the consequences it may have in the rest of the code.

I see that in MacOS, GlyphBufferGlyph is defined to be a CGGlyph instead of WebKit internal Glyph type. That's also an uint16, however, but customizing this type is not unheard of.

in reply to:  26 comment:27 by madmax, 3 years ago

Replying to pulkomandy:

Please bear with me through another wall of text, I'll try to err on the "too much" side this time.

We identity-pair them with the value of the character, the 32 bit codepoint, as if they were the same (see ​GlyphPageTreeNodeHaiku or ​FontHaiku, for example). So [emojis] don't work because they are in plane 1 and that bit is lost before giving the string to Haiku.

What else could we do? There is no notion of "planes" anywhere in WebKit, right? So where should that come from? At which point does webkit remove that information, and where does it store it instead?

That'd be in the characters, and it'd be our code that removes it when we put unicode codepoint values into 16 bit glyphs. Details below.

Then in FontHaiku instead of drawing the glyphs with the given advances, as it seems what the WebKit infrastructure expects, we rebuild the string to use Haiku API to draw it

This is not an "instead". We do use the advances, but we put the glyphs back into a string.

I was not very clear with that one, sorry. We do use the advances, yes, but we don't put the glyph back when we have signaled that we don't have a glyph for that character. That's the cause of the garbled text. Details below.

I think WebKit expects the font thing to be more font file level, not BFont level. When working through some text, it would call the GlyphPage code when needed (and it does so with ranges of characters, not strides of real text) to have a character <-> internal-glyph map. That's quite a guess, but please assume it's correct enough for the moment. When we setGlyphForIndex(i, character) we do have say 0x1f3af in character, but it's casted to the 16 bit Glyph of the second parameter for setGlyphForIndex, so the map becomes "codepoint 0x1f3af is glyph 0xf3af", and we can't recover 0x1f3af when we are given glyphs to draw. When we don't have a glyph, setGlyphForIndex(i, 0) would be right for an implementation as WebKit expects, but it creates problems for us, as we won't be able to put the original character back and, as we'll see, we don't put *any* character back for that glyph, mismatching the string with the advances.

When it finally wants the text to be drawn, drawGlyphs is called with the needed glyphs, not characters. The typical lower-level-than-BFont implementation would go through the glyphs array, retrieve each one from the font and draw it in the given positions. That would include glyph 0, the "missing glyph" glyph. We instead rebuild the text string and use the advances to call Haiku's DrawString. But we have aliased characters > 65535, we don't really know what character produced a glyph 0 and have yet another problem with that. Let me run an example.

Say we have text with characters 1, 2, 3, 4 and 5. We don't have a glyph for char 2. Say glyphs' advances are equal to their codepoint (just to have something easy with different widths). drawGlyphs would be called with glyphs [1, 0, 3, 4, 5] and the same advances widths. The offsets we calculate are then [0, 1, 1, 4, 8].

To rebuild the string we go through each of the glyphs/characters, turn each one of them to an utf8 string with BUnicodeChar::ToUTF8 and append that to BString. So we get 1, the utf8 string is [1, 0], and our BString becomes [1, 0]. We get 0, the utf8 string for that is [0], and our BString stays [1, 0]. And there is our problem. We then get 3, the string becomes [1, 3, 0], etc. In the end we have the string [1, 3, 4, 5, 0]. So we call DrawString with the correct advances for the full string, but only with the characters for which we have glyphs. Character 1 is OK. Character 3 is OK too if you expect unavailable glyphs to not take any space. Character 4 goes over 3 and also overflows the allotted space, as it was supposed to be for 3. Character 5 goes where 4 should have been, and is a bit over the drawn 4.

Notice how that, as long as GetHasGlyphs returns false and nothing is changed in WebKit side, the garbling is independent of whether we return the "missing glyph" glyph or not, as DrawString does not even get the character.

So what can we do?

Well, to "play by the rules" I guess one would use FreeType-level stuff, ask it to render bitmaps instead of returning vector glyphs (if possible, I haven't checked, and it would probably give a subtly different rendering than Haiku's) or duplicate Haiku's rendering stuff, put each glyph in a buffer image and finally send that image to be drawn. So I think the identity mapping, string recovering and call to DrawString is a really good idea, it just has those two problems that were probably very rare to notice until this fashion of having emojis everywhere.

One problem should be solved with a 32bit Glyph type, if WebKit does not assume it's 16 bits anywhere else.

For the other one there are at least three possibilities:

  • Discard advances for glyph 0 in drawGlyphs don't increment the new index we'd have to use to access the offsets array, and decrement numGlyphs at the end with the number of glyphs we have skipped (or just ask BString what its length is).
  • Still in drawGlyphs, just append a space to the BString when get glyph 0.
  • In GlyphPage::fill, always setGlyphForIndex(i, character), whether we have the glyph or not.

I think the third one would show the "missing glyph" glyph with no further changes in WebKit when we make the fallback code return it, and nothing (but not garble the rest of the text) until then.

comment:28 by pulkomandy, 3 years ago

Well, to "play by the rules" I guess one would use FreeType-level stuff, ask it to render bitmaps instead of returning vector glyphs (if possible, I haven't checked, and it would probably give a subtly different rendering than Haiku's) or duplicate Haiku's rendering stuff, put each glyph in a buffer image and finally send that image to be drawn.

That is not "by the rules" on Haiku side. Drawing strings is the job of app_server.

We could add an API to BFont to allow querying for glyph numbers and requesting to draw specific glyphs. This would be useful for example if we had ligatures or other things like that. I expect with the current code, these are also going to cause problems with selection in WebKit (which needs a text cursor that moves glyph per glyph, and not end up in the middle of a glyph, and not be properly renderable).

For the other one there are at least three possibilities:

We could add the "missing glyph" character explicitly at WebKit level, right? Instead of adding glyph 0, we would add U+25A1 □ WHITE SQUARE. That would avoid the issue on WebKit side. It sounds better than using a space.

in reply to:  28 comment:29 by madmax, 3 years ago

Replying to pulkomandy:

Well, to "play by the rules" [...]

That is not "by the rules" on Haiku side. Drawing strings is the job of app_server.

Agreed, those were WebKit rules, and in fact just what I suppose them to be.

For the other one there are at least three possibilities:

We could add the "missing glyph" character explicitly at WebKit level, right? Instead of adding glyph 0, we would add U+25A1 □ WHITE SQUARE. That would avoid the issue on WebKit side. It sounds better than using a space.

But we would not have the advance for it. I proposed the space because the next character would be drawn over it. Any blank glyph would do. Except if a font does have a glyph that draws a small line or dot for a space it would be a bit dirty.

Now, if on top of using the white square box we can have a good guess for its advancement, that would be OK.

comment:30 by pulkomandy, 3 years ago

Attached an attempt of a patch. It does not make a difference (but I didn't update my Haiku install to have Noto Emoji in the font overlay).

As you can see in the patch, doing it this way breaks MathML rendering (I just hacked things to get them to compile, but it won't work).

So I have to see if changing the whole "Glyph" type is possible instead.

by pulkomandy, 3 years ago

Attachment: 32BitGlyph.diff added

comment:31 by pulkomandy, 3 years ago

Resolution: fixed
Status: reopenedclosed

I have switched Glyph and CGGlyph to 32bit types in HaikuWebKit and now it's working fine.

This will be included in the next release of haikuwebkit 1.8.1.

Thanks for your help investigating this!

in reply to:  31 comment:32 by madmax, 3 years ago

Replying to pulkomandy:

I have switched Glyph and CGGlyph to 32bit types in HaikuWebKit and now it's working fine.

Cool.

But that should just avoid aliasing emojis (and other stuff) to different characters. There would still be the garbling issue that needs to either keep the identity map in GlyphPageTreeNodeHaiku.cpp (by always emitting the character instead of glyph 0) or detect the 0 in FontHaiku.cpp and do something about it. If it works without that it may be just because with Noto Emoji we have all the glyphs necessary for that specific page. Can you try without that font?

comment:33 by nephele, 3 years ago

The minimal testcase above is still broken for me with the latest (1.8.0+2) haikuwebkit code, This is on the beta3 branch however without noto emoji in the fallback list. If we do add it to the list this will likely still be an issue anytime we don't have a font, so I don't know if this is a good fix. And likewise I think we should only add important fonts to the fallback and fix the code to build the list on package mounting or unmounting instead of fully hardcoding it, but that is more for after beta3.

comment:34 by pulkomandy, 3 years ago

See the comment above yours:

There would still be the garbling issue that needs to either keep the identity map in GlyphPageTreeNodeHaiku.cpp (by always emitting the character instead of glyph 0) or detect the 0 in FontHaiku.cpp and do something about it.

This part still needs to be done.

Note: See TracTickets for help on using tickets.