Opened 4 years ago

Closed 2 years ago

Last modified 2 years ago

#11518 closed enhancement (fixed)

Implement BFont::Blocks

Reported by: dsizzle Owned by: nobody
Priority: normal Milestone: R1
Component: Kits/Interface Kit Version: R1/Development
Keywords: bfont, blocks, font, unicode Cc: dcieslak@…
Blocked By: Blocking: #3651
Has a Patch: yes Platform: All

Description

BFont::Blocks needs to be implemented. Currently it's just a stub.

Expected results: BFont::Blocks returns a unicode_blocks object containing information about which Unicode blocks are supported for a given font.

Actual results: BFont::Blocks returns an empty unicode_blocks object.

Attachments (5)

11518-BFont_Blocks.patch (45.6 KB) - added by dsizzle 3 years ago.
Implementation of BFont::Blocks
11518-BFont_Blocks-0309.patch (31.9 KB) - added by dsizzle 3 years ago.
Fixed patch to address comments
0001-Implemented-BFont-Blocks.patch (33.4 KB) - added by dsizzle 3 years ago.
More error-checking and style-fixes.
0001-Implemented-BFont-Blocks-added-build-feature-for-fon.patch (30.7 KB) - added by digib0y 2 years ago.
First step, new blocks in next patch.
0001-Implementation-of-BFont-Blocks.patch (20.3 KB) - added by dsizzle 2 years ago.
New, working patch

Download all attachments as: .zip

Change History (48)

comment:1 Changed 4 years ago by dsizzle

Type: bugenhancement

comment:2 Changed 4 years ago by dsizzle

Cc: dcieslak@… added

Blocks Enhancement 4 of #3651

comment:3 Changed 4 years ago by pulkomandy

Blocking: 3651 added

comment:4 Changed 4 years ago by dsizzle

I've been trying to figure out how to get fontconfig installed/included in the Haiku build process so I can use it, but not having luck. I tried mirroring what some other tools are using, like freetype, but both Jam and the haiku layout are foreign to me...can someone either give me some hints, or just add fontconfig to the build if that's easier? Thanks!!

comment:5 Changed 4 years ago by pulkomandy

Hi, Right, this part is a bit more complex.

There seem to be only a gcc4 version of fontconfig available in the repository, so first we need to get a gcc2 version of it if we want to use it. I will look into this since it is not possible for you to upload packages without commit access.

Then, you need a build feature (in build/jam/BuildFeatures to make it possible for jam targets to depend on the package.

Finally, either the interface kit or the app_server (depending on how the feature is implemented) will need to use that build feature. This is done for example in src/servers/app/Jamfile:

UseBuildFeatureHeaders freetype ;

Once this is done you can reference parts of the package, for example the library:

[ BuildFeatureAttribute freetype : library ]

(this expands to libfreetype.so with the correct path). BuildFeatureAttribute can be used the same way to get the header directories, or any other file you need from the package.

comment:6 Changed 3 years ago by dsizzle

I've been away for a while...did anyone ever get a chance to add a gcc2 version of fontconfig to the repo?

comment:7 Changed 3 years ago by pulkomandy

Yes, fonctonfig is now available for both architectures.

comment:8 Changed 3 years ago by dsizzle

Cool, thanks! I'll see about tackling this issue once again.

Changed 3 years ago by dsizzle

Attachment: 11518-BFont_Blocks.patch added

Implementation of BFont::Blocks

comment:9 Changed 3 years ago by dsizzle

Has a Patch: set

comment:10 Changed 3 years ago by dsizzle

Added a patch that implements BFont::Blocks.

  • added build features for fontconfig
  • made app server use build feature
  • added app server message for getting unicode blocks
  • implemented function for parsing unicode block information from fontconfig; binary search to find block containing a given code point/range
  • doubleed size of unicode_block bitfield to support more block definitions
  • rewrote block definitions to use larger bitfield; added more blocks that weren't previously defined (but were used by CharacterMap); added definition of block ranges

comment:11 Changed 3 years ago by korli

Thanks for the patch.

1/ you shouldn't change build/jam/DefaultBuildProfiles, there is no fontconfig in haikuports.cross. 2/ dependency on fontconfig should be optional, when not present, the ServerFont.cpp code can be stubbed (as now). 3/ i understand you needed to change "class unicode_block", but it is a public API. Maybe make it dependent of GCC2 I don't know.

comment:12 Changed 3 years ago by pulkomandy

For 2 you can do things similarly to the way OpenSSL is handled.

Changed 3 years ago by dsizzle

Fixed patch to address comments

comment:13 Changed 3 years ago by dsizzle

Thanks for the comments, korli and pulkomandy. There were several new items to learn so I didn't expect to get them right the first time! :)

I've attached a new patch to address these items:

1- removed change to DefaultBuildprofiles 2- added code to mirror what OpenSSl does; if fontconfig isn't available it's not used at all. 3- I had changed the public API initially because it wasn't actually used by anything in the Haiku codebase, and I didn't suspect any 3rd-party dev would've used it. unicode_block is basically just an arbitrary BeOS bitfield that would be useless without something like BFont::Blocks.

In any case, I think my new solution is safer: I added an overloading constructor to match the old API. I also reverted all of the old BeOS unicode_block constants so they are identical to what they were before. This should provide backwards compatibility if anyone actually did use it.

*CharacterMap does "use" it but doesn't actually do anything with it.

Let me know if that is a good solution. Thanks!

Last edited 3 years ago by dsizzle (previous) (diff)

comment:14 Changed 3 years ago by korli

Sounds better.

1/ Where comes kUnicodeBlockMap's content from? Why not keep that private in ServerFont.cpp? 2/ in ParseFcMap() you can replace the while loop with a for loop. If there are 32 bits, why do you check for 0x8000 instead of 0x80000000 ? 2/ Error handling seems to be missing in GetUnicodeBlocks(). FcFreeTypeCharSet can return null, FcCharSetFirstPage can return FC_CHARSET_DONE 3/ A few style issues:

  • in Font.h and UnicodeBlockObjects.h, don't delete blank lines.
  • in ServerFont.cpp, a new line is missing after the return type of functions. 'for' doesn't need a new line before the brace block. "(rangeStart-1)" should be "(rangeStart - 1)". "int i=0" should be "int i = 0", etc..

4/ a 'git format-patch' would be needed with a commit message, explaining the changes in API (why it is harmless) plus the implementation.

comment:15 Changed 3 years ago by dsizzle

Has a Patch: unset

Changed 3 years ago by dsizzle

More error-checking and style-fixes.

comment:16 Changed 3 years ago by dsizzle

Has a Patch: set

comment:17 Changed 3 years ago by dsizzle

Added a new patch to address your comments.

1/ The content of kUnicodeBlockMap is drawn from the omments of the unicode_block definitions, and the ranges defined in CharacterMap. I didn't make them private because it seemed like other code (like CharacterMap) could find those ranges useful. 2/Implemented all the changes you pointed out. 3/Fixed the style issues as best as I could. is there a better linter than the python check_styles script? 4/ latest patch is in 'git format-patch' format, though it does seem to mush my commit message into one huge blob, removing the newlines. Is that ok?

comment:18 Changed 3 years ago by pulkomandy

Unfortunately, changing unicode_blocks this way still breaks binary compatibility (the size of the class is changed). Moreover, there are currently 273 blocks in Unicode version 9, so it is still not large enough.

I would keep the unicode_blocks struct as it is (with only support for the blocks as defined in BeOS), and introduce a new API (maybe BFont::SupportsBlock(int blockNumber) which makes it easier to manage things in the long term.

comment:19 Changed 3 years ago by dsizzle

I'm sorry, but I lost interest...I know everyone in the Haiku organization is very busy and mostly volunteers, but waiting 7 months for a comment on a patch is likely why it's hard to attract developers.

comment:20 Changed 3 years ago by pulkomandy

I know, and that's why I've been trying to merge more of the old patches lately, so new ones are handled more quickly. Well, the comments are still there for anyone who want to take your patch and continue the work from there.

Thanks for your work on this, in any case!

comment:21 Changed 2 years ago by pulkomandy

You have a dual ";" in unicode_block::operator=

The new "Haiku" blocks won't work as is, because the unicode_block struct does not have enough bits. I think we already discussed this above. The idea is that this struct should be used only in BFont to return the blocks to BeOS apps using the legacy API. The internals of how it is implemented (querying the app_server, etc) should be designed using a different solution, that can accomodate more blocks.

This will allow to add a new API to BFont, as an addition to the legacy one, that supports all the blocks.

comment:22 Changed 2 years ago by digib0y

I will fix the dual ; in the next patch. Apologies.

Yes, they won't work bit what about the BeOS legacy blocks? If I am not wrong they should be working after this patch. We can look into the new "Haiku" blocks after the first patch.

I guess the feature should be working now.

comment:23 Changed 2 years ago by pulkomandy

Yes, that part should work, but we will have to rewrite a lot of it to get the complete set of blocks working. Which means we can't really close this ticket then, even if the patch is merged.

Changed 2 years ago by digib0y

First step, new blocks in next patch.

comment:24 Changed 2 years ago by axeld

I'm not quite sure how this can compile, since unicode_block only has a constructor that takes two arguments, and also doesn't have the space to store more.

The rest of the patch looks quite nice, however; you also do a great job at not letting your code stick out.

I have a couple of coding style issues, though (which might be issues found in the surrounding, code, too):

  • Font.cpp, line 848: the if clause should fit 80 characters, and should thus be on a single line.
  • ServerFont.cpp, line 464ff, 487: single line blocks don't get {}.
  • ServerFont.cpp, line 474: Should be a doxygen style comment. Sentences start with upper case.
  • ServerFont.cpp, line 486: We require boolean expressions in if's, ie. that would be if ((curVal & 0x80000000) != 0). Also, curVal is not a good name.

Thanks!

comment:25 Changed 2 years ago by dsizzle

As the author of 99.5% of that patch, I just wanted to jump in and say thanks for the compliment from axeld. Maybe one day I'll have time to get back into it!

re: "I'm not quite sure how this can compile, since unicode_block only has a constructor that takes two arguments, and also doesn't have the space to store more," the patch originally had a 4-argument constructor but that was part of the "breaking binary compatibility." It looks like digib0y removed the constructor but didn't update the blocks that use it.

comment:26 Changed 2 years ago by axeld

Sorry, dsizzle, I did not read the original patch, so I didn't know what part could be attributed to you or digib0y. That probably doesn't make you feel any better... in any case, that compliment is well earned! I'm sorry we dropped the ball with commenting on your work earlier!

comment:27 Changed 2 years ago by axeld

Owner: changed from axeld to nobody
Status: newassigned

comment:28 Changed 2 years ago by dsizzle

I started working on this again. The patch doesn't work. Not sure how I tested it before but as written, Haiku won't even boot. Once I fixed that, it crashed the app server. I fixed both of those issues and it hangs CharacterMap....but I'm working on it.

Anyway I'm implementing what pulkomandy suggested: BFont::Blocks will only handle legacy BeOS Unicode block info, and a new, separate method will provide more generic query access to block information.

comment:29 Changed 2 years ago by dsizzle

Ok, I just added a new patch that I believe is both working, and addresses the previous issues. It shouldn't break binary compatibility as the structs are back to what they were before. So, BFont::Blocks will only work for existing BeOS blocks and nothing else. I added a new API to BFont, called IncludesBlock(), that checks for an arbitrary unicode block, specified via a starting and ending codepoint. It can't use a single block number because Unicode blocks aren't really numbered.

Two caveats:

  1. Earlier in the thread it was mentioned that I shouldn't have to edit DefaultBuildProfiles. It seemed like I did to get Haiku to actually build and run properly, but I left that change out of the patch.
  1. In testing I found that this can be extremely slow if you need to check for lots of blocks in a font. It's not enough to test the start and end codepoints for a block because a single character can be defined in the middle of a block, so I check every single character in a block until a hit is found (in the BFont::IncludesBlock function). For fonts with large character sets it takes forever to check every block.

comment:30 Changed 2 years ago by pulkomandy

Style review:

  • Always exactly two blank lines between struct declarations, functions, etc (Font.h lines 130, 348)

Functional review:

  • ServerProtocol.h: I don't know if it is allowed to change the protocol this way. Do we guarantee any compatibility accross versions here? (for remote_app_server and the like)

Other than this, it looks like a good start. We will have to see if the performance is a problem in actual uses cases and if it can be improved.

comment:31 Changed 2 years ago by dsizzle

For ServerProtocol.h, are you saying that I should move the new definitions down to the bottom of the list so that I don't affect existing enum values? Or define them elsewhere?

Thanks!

comment:32 Changed 2 years ago by pulkomandy

Yes, moving them at the end of the list is what I meant. I don't know if it is required to do this, however - maybe it is more interesting to keep the list sorted logically, than keeping the app_server protocol compatible. Let's wait for comments from other developers on this?

comment:33 Changed 2 years ago by axeld

The constants should be added where they fit best, so the change is okay in this regard. The remote app_server works on the DrawingInterface level, it doesn't know anything about windows and such.

Besides this, if we wanted a stable interface, we should set constants manually instead; an enumeration with automatic values is not really suited for this.

The patch looks quite nice overall, but I have a few questions/remarks:

  1. What's the purpose of the unicode_block_range array? I'm wondering a bit as it uses the old unicode_block, but I don't quite see why. If there is something new, it should contain all the new unicode block ranges, too (those in the Supplementary Multilingual Plane and Supplementary Ideographic Plane), but I'm not sure this would have to be part of the public API, as it would be rarely needed by applications.
  2. ServerFont.cpp, line 616: it should break out of the loop.

And finally, there are a few more minor coding style violations I'd like to point out:

  • Font.h line 129: stray leading space.
  • Font.cpp line 848, and 870: why not put the if clause on one line?
  • ServerFont.cpp: FindBlockForCodepoint() and ParseFcMap() should be static.
  • ServerFont.cpp line 496, and other places: use "& " style, not " &".
  • line 514: Operators go to the next line. Multi-line if statements are always surrounded by {}.
  • line 521: If an argument doesn't fit into the 80 character column limit, it's only indented a single tab relative to its origin line.
  • line 532: Operators go to the next line.

comment:34 Changed 2 years ago by dsizzle

The purpose of the unicode_block_range array is to provide a mapping of the BeOS constants to the actual Unicode codepoint ranges that they describe. The existing unicode_block array is simply defining a bitfield for the blocks, but nowhere is it ever defined what the blocks actually are. Fontconfig deals in codepoints, so without some sort of mapping there's no way to know what Unicode block any arbitrary codepoint belongs to so that BFont::Blocks can set the appropriate bit. For example, B_LATIN1_SUPPLEMENT_BLOCK is bit 2 in the bitfield, but the only place the range 0080 - 00FF is mentioned is in the comment...so it has to be defined somewhere.

The reason the unicode_block_range array only contains the existing blocks is, as described above, its only purpose is to map the bitfield to ranges. There is space for more bits in the existing bitfield, so some *could* be added, but there are more ranges missing from the list than available bits so it'd be an arbitrary subset (as it is now).

This is why the second addition to the API is a function that allows you to look up any range of codepoints, so that no other knowledge of Unicode blocks needs to be hardcoded. It's not as efficient but is future-proof. I'm not aware of any canonical block list for Unicode, so "block 23" doesn't necessarily mean anything. If I'm wrong I'd be happy to refactor to support numerical block lookups.

I hope that all makes sense. If not please let me know.

I made the rest of the stylistic changes described above, except for:

Font.cpp line 848, and 870: why not put the if clause on one line?

  • I copied the code (and style) from surrounding functions, so there are several functions in Font.cpp that use 2 lines for the if clause. Should I fix all of them? I generally think it's not the best practice to mix in stylistic changes from code you didn't touch...but I could submit that as a "code cleanup" patch.

Also for ServerFont.cpp line 514, do you consider it a "multi-line if statement" when it's one line of code broken up to stay under 80 characters? I left the braces out originally because it's just one logical line.

comment:35 Changed 2 years ago by axeld

Thanks for the detailed explanations! Is there any reason for the unicode_block_range to be public, then?

About the coding style; we usually try to gradually improve the codebase when we touch a file (as the coding style unfortunately changed over time). For example, when I intend to work on a file, I look over it, and fix any style issues first in a separate commit (as the style issues won't then distract from my actual interest, and I know it's hard for me to resist to fix them :-)).

A separate style cleanup patch is always welcome, but it's also okay if only the new code follows the coding style, if the file should actually follow our coding style (there is obviously no point in changing third party code this way). In any case, there is no need for a further update of your patch because of this.

And about ServerFont.cpp, line 514: yes, we're not "logical" about multi-line statements; anything that visually occupies more than one line (even if one is only a comment) is considered a multi-line statement, and deserves its own block.

In general, our coding style tries to make reading and grasping the code as easy as possible, without any cruft that could distract from the actual code, but a lot of it is, of course, caused by habituation. Descriptive naming is probably the most important part of it, though.

Anyway, I'd say the patch is good to go. If no one speaks up, I'll try to commit it later today. Thanks a lot, also for your endurance! :-)

comment:36 Changed 2 years ago by pulkomandy

The unicode blocks are defined in the standard, Wikipedia has a list: https://en.wikipedia.org/wiki/Unicode_block

Source of the block list at unicode: http://www.unicode.org/Public/UNIDATA/Blocks.txt

I think the API could use that, but being able to check a more specific range is great too (in some cases, I want to check that a font has a few special characters I'm going to use - not a whole block).

So, what I would do is: 1) Keep your patch as it is, with support for checking characters in a given range 2) Add an API to convert an unicode block constant to the matching range (we can use unicode_block_range for this) 3) Update our block constant list to match Unicode 9, and make the API to match them with unicode_block_range handle all these blocks.

comment:37 Changed 2 years ago by axeld

Okay, let's declare the unicode block range/block info a construction site still; we should just sort out at least the public/private issues before the release.

I'll commit the patch as is later today (if no one beats me to it, that is).

comment:38 Changed 2 years ago by axeld

I get a compilation error:

cc1plus: warnings being treated as errors
/boot/home/develop/haiku/haiku/src/servers/app/ServerFont.cpp: In function `void ParseFcMap(unsigned int *, unsigned int, unicode_block &)':
/boot/home/develop/haiku/haiku/src/servers/app/ServerFont.cpp:507: warning: `int32 foundStartBlock' might be used uninitialized in this function

And it looks complicated enough that it might be an actual issue.

Also, I noticed some more style issues you could fix on the way: :-)

  • There are extraneous parenthesis in the if-term in ServerFont.cpp line 520.
  • Please change your editor settings to remove whitespace at the end of a line; your patch introduces quite a number of such cases.

Changed 2 years ago by dsizzle

New, working patch

comment:39 Changed 2 years ago by dsizzle

Ok, I set Pe to strip trailing blanks so that should be better for future stuff, and I fixed the extraneous parens in ServerFont.cpp line 520. Resaving the file also got rid of the trailing blanks.

I also initialized the vars to avoid the warning. I use jam with -q which I *thought* uses -Wall and -Werror...do you also use -Wextra?

A possible reason to have the unicode_block_ranges public could be for a tool such as CharacterMap, which currently defines its own data structure with similar information: unicode ranges, a mapping to a block constant (if available, otherwise kNoBlock), and whether the block is private or not. I can see where it might be good to have that as part of the API, but I can also argue that it is specialized to a tool like CharacterMap, so not generally useful compiled into Haiku itself (like Pulkomandy's comment about mostly wanting to look up a few chars here and there but almost never a whole block).

comment:40 Changed 2 years ago by pulkomandy

-Wall is always active, but -Werror is only enabled for directories that have been cleaned of all warnings for both gcc2 and gcc5. And each compiler emits somewhat different sets of warnings, so sometimes you need to check with both (the buildbots can be used for that but only after the patch is merged).

comment:41 Changed 2 years ago by axeld

I don't know why you didn't get the error; it's kinda odd. Were you compiling with GCC2 or 5? (I used GCC2)

I agree that a unicode block list similar to what is implemented in CharacterMap can be useful as part of the public API; but that also adds substantially more information to that list, too, to make it actually useful. At least the only reason I would expect to tangle with unicode blocks is that you want to show them to the user, with their name.

Anyway, I'll try to commit the patch later, thanks!

comment:42 Changed 2 years ago by axeld

Resolution: fixed
Status: assignedclosed

Later as in today; finally applied in hrev51155, thanks a lot!

I still think that the unicode_block_range should be private as is, although I agree that a similar API/info table could be a worthwhile API extension.

comment:43 Changed 2 years ago by dsizzle

Glad to help! Sorry I broke the package tho! I guess I didn't have the dependency on fontconfig correct.

Note: See TracTickets for help on using tickets.