Opened 8 years ago

Closed 8 years ago

#7697 closed bug (fixed)

[AboutSystem] Link to MIT licence looks strange when localized

Reported by: taos Owned by: nobody
Priority: normal Milestone: R1
Component: Applications/AboutSystem Version: R1/Development
Keywords: localization Cc:
Blocked By: Blocking:
Has a Patch: yes Platform: All

Description

Using r1a3-rc-hrev42159.

When using the german localization, the link "MIT licence" (string: ... %MIT licence% ...) in AboutSystem is displayed as "%IT Lizenz%" instead of "MIT Lizenz".

Link to MIT licence in AboutSystem (german localization).

Attachments (13)

MIT_link.png (46.9 KB) - added by taos 8 years ago.
Link to MIT licence in AboutSystem (german localization).
es.png (1.5 KB) - added by taos 8 years ago.
Spanish localization.
it.png (1.4 KB) - added by taos 8 years ago.
Italian localization.
fr.png (1.5 KB) - added by taos 8 years ago.
French localization.
de.png (764 bytes) - added by taos 8 years ago.
German localization.
en.png (653 bytes) - added by taos 8 years ago.
No localization.
fi.png (1.1 KB) - added by taos 8 years ago.
Finnish localization.
nl.png (668 bytes) - added by taos 8 years ago.
Dutch localization.
AboutSystem_no_Chars.png (44.5 KB) - added by taos 8 years ago.
Screenshot of patched AboutSystem (German localization).
AboutSystem.cpp.patch (977 bytes) - added by taos 8 years ago.
Patch to fix display of MIT license link (compatible with current catkeys in alpha3-rc).
AboutSystem.cpp.consistent_spelling_of_license.patch (1.7 KB) - added by taos 8 years ago.
Patch to fix display of MIT license link and introduces consistent spelling of license (needs new catkeys).
AboutSystem_updated_r42387.patch (1.7 KB) - added by taos 8 years ago.
Updated patch to fix display of MIT license link and to introduce consistent spelling of "license".
AboutSystem_updated_r42512.patch (1.7 KB) - added by taos 8 years ago.
Updated patch to fix display of MIT license link and to introduce consistent spelling of "license".

Download all attachments as: .zip

Change History (31)

Changed 8 years ago by taos

Attachment: MIT_link.png added

Link to MIT licence in AboutSystem (german localization).

comment:1 Changed 8 years ago by richienyhus

This looks like a translation error from overzealous translation to me.

comment:2 Changed 8 years ago by taos

Here are the corresponding excerpts from AboutSystem.cpp:

BPath mitPath;
_GetLicensePath("MIT", mitPath);
// Haiku license
BString haikuLicence = B_TRANSLATE("The code that is unique to Haiku, "
	"especially the kernel and all code that applications may link "
	"against, is distributed under the terms of the %MIT licence%. "
	"Some system libraries contain third party code distributed under the "
	"LGPL license. You can find the copyrights to third party code below.\n"
	"\n");
int32 licencePart1 = haikuLicence.FindFirst("%");
int32 licencePart2 = haikuLicence.FindLast("%");
BString part;
haikuLicence.CopyCharsInto(part, 0, licencePart1 );
fCreditsView->Insert(part);
	part.Truncate(0);
haikuLicence.CopyCharsInto(part, licencePart1 + 1, licencePart2 - 1
	- licencePart1);
fCreditsView->InsertHyperText(part, new OpenFileAction(mitPath.Path()));
	part.Truncate(0);
haikuLicence.CopyCharsInto(part, licencePart2 + 1, haikuLicence.CountChars()
	- licencePart2);
fCreditsView->Insert(part);

Any ideas? The French version displays "%licnce MIT%" (catkeys: %licence MIT%), the Russian "%MIT licence%", the Finnish "%MT licence%-isenssin" (catkeys: %MIT licence%-lisenssin). It doesn't even show the correct phrase without %s if this part of the string isn't translated at all. Only the non-localized English variant works.

BTW, apart from this sentence, the spelling "license" is used everywhere else in AboutSystem.

Last edited 8 years ago by taos (previous) (diff)

comment:3 Changed 8 years ago by richienyhus

It should be license which is American English, whilst the spelling licence is British English.

Last edited 8 years ago by richienyhus (previous) (diff)

comment:4 Changed 8 years ago by siarzhuk

IMO this is an excessive localization - the names of licenses must not be localized at all, because it breaks the "click and read" functionality in the list of installed software.

comment:5 in reply to:  4 Changed 8 years ago by taos

Replying to siarzhuk:

IMO this is an excessive localization - the names of licenses must not be localized at all, because it breaks the "click and read" functionality in the list of installed software.

I've also noticed this (see #7491 AboutSystem: clicking on localized license name doesn't open license file). But in this case, the "click and read" functionality works fine in a localized system. It's only the graphical representation of the licence name that is a little odd - even if the licence name itself isn't translated. At the moment, if you don't translate this part of the string you'll still get "%MIT licence%" instead of "MIT licence". Only in an English system, the %s are cut as intended.

comment:6 Changed 8 years ago by taos

Okay, call me completely mad. I think I found the problem: Characters with diacritics (e.g. é, ö, ñ) are counted twice (but why?). Therefore, the string is cut into parts at the wrong places. Instead of cutting out the %s (... |%|MIT...|%| ...), the cut position is shifted (e.g. ... %|M|IT...%| |...).

There is a linear correlation between the number of characters with diacritics in the translated string before %MIT...% and the shifting of the cut position.

language number of diacritics shift of cutting position cutting positions screenshot
English 0 0 |%|MIT licence|%|. No localization.
Dutch 0 0 |%|MIT-licence|%|. Dutch localization.
German 1 (ö) +1 %|M|IT Lizenz%|.| German localization.
Italian 2 (è, è) +2 %l|i|cenza MIT%.| |Alcune Italian localization.
Finnish 2 (ä, ä) +2 %M|I|T licence%-|l|isenssin Finnish localization.
French 4 (é, à, é, é) +4 %lic|e|nce MIT%. Q|u|elques French localization.
Spanish 5 (ó, ú, ú, ó, é) +5 %lice|n|cia MIT%. Al|g|unas Spanish localization.

EDIT: As soon as the translation uses non-ASCII characters, the cutting is off. With a language written in non-latin script (cyrillic, CJK, etc) it is so bad that you have to click a few lines below the license name to activate the link and open the license file.

Any ideas how to solve this?

Last edited 8 years ago by taos (previous) (diff)

Changed 8 years ago by taos

Attachment: es.png added

Spanish localization.

Changed 8 years ago by taos

Attachment: it.png added

Italian localization.

Changed 8 years ago by taos

Attachment: fr.png added

French localization.

Changed 8 years ago by taos

Attachment: de.png added

German localization.

Changed 8 years ago by taos

Attachment: en.png added

No localization.

Changed 8 years ago by taos

Attachment: fi.png added

Finnish localization.

Changed 8 years ago by taos

Attachment: nl.png added

Dutch localization.

comment:7 Changed 8 years ago by axeld

Hey, you're getting close! :-)

The problem you're describing happens because not each character uses up the same number of bytes (we're using UTF-8 encoding everywhere). While not apparent, the code in question mixes byte offsets/lengths with character offsets/lengths, and is therefore causing the problems. For example, FindFirst() is byte based, while CountChars() is character based.

comment:8 in reply to:  7 Changed 8 years ago by taos

Replying to axeld:

For example, FindFirst() is byte based, while CountChars() is character based.

Good to know. I've seen mentioned in The Haiku Book that CountChars() is aware of UTF8 and therefore counts the actual number of characters but I was not sure about FindFirst(), FindLast(), and Truncate().

So, is there already an easy way to make byte offsets compatible with character offsets? I guess something similar might have happened somewhere else in the haiku source code.

comment:9 Changed 8 years ago by taos

Hmmm, okay. If I replace CopyCharsInto() with CopyInto() and CountChars() with Length() the string is cut at the correct place - only both %s are removed as intended.

The graphical representation is okay (see screenshot below), the "click and read" functionality is still there (and at the correct place even in a Japanese or Russian system).

Screenshot of patched AboutSystem (German localization).

But, I'm not sure if removing the awareness of UTF8 in the code is the right way. However, if you're interested in a quick and dirty fix I can attach a patch file.

Changed 8 years ago by taos

Attachment: AboutSystem_no_Chars.png added

Screenshot of patched AboutSystem (German localization).

comment:10 in reply to:  9 ; Changed 8 years ago by mmlr

Replying to taos:

Hmmm, okay. If I replace CopyCharsInto() with CopyInto() and CountChars() with Length() the string is cut at the correct place - only both %s are removed as intended. ... But, I'm not sure if removing the awareness of UTF8 in the code is the right way.

The other way around would be the better way, i.e. replacing the calls with their *Chars() version instead. There are Find{First|Last}Chars() versions as well, so you can make all the calls Chars aware.

comment:11 in reply to:  10 Changed 8 years ago by taos

Replying to mmlr:

The other way around would be the better way, i.e. replacing the calls with their *Chars() version instead. There are Find{First|Last}Chars() versions as well, so you can make all the calls Chars aware.

Really? That's what I tried first, but I couldn't compile the changed code because of a "no matching function for call to..." error. I'll try again - might have been a typo on my side.

comment:12 in reply to:  10 ; Changed 8 years ago by axeld

Replying to mmlr:

The other way around would be the better way, i.e. replacing the calls with their *Chars() version instead. There are Find{First|Last}Chars() versions as well, so you can make all the calls Chars aware.

In this particular case, it doesn't really matter, as the '%' character is no UTF-8 character, and that's the only thing which matters for this code. So I think either solution would be fine. Patches welcome :-)

comment:13 in reply to:  12 ; Changed 8 years ago by bonefish

Replying to axeld:

In this particular case, it doesn't really matter, as the '%' character is no UTF-8 character, and that's the only thing which matters for this code. So I think either solution would be fine.

Since the *Char* variants are slower, I'd even say the non-UTF-8 aware solution is preferrable.

Patches welcome :-)

Ideally one also fixing the off-by-one error in the last CopyCharsInto() call.

comment:14 in reply to:  13 ; Changed 8 years ago by taos

Replying to bonefish:

Since the *Char* variants are slower, I'd even say the non-UTF-8 aware solution is preferrable.

Patches welcome :-)

Added two patches: First one (AboutSystem.cpp.patch) doesn't change the spelling of "licence" - can be used with the catkeys from HTA in alpha 3 branch. Second patch (AboutSystem.cpp.consistent_spelling_of_license.patch) changes the spelling to the US variant "license" but needs new catkeys.

Ideally one also fixing the off-by-one error in the last CopyCharsInto() call.

Not really sure???

comment:15 Changed 8 years ago by taos

Has a Patch: set

comment:16 in reply to:  14 ; Changed 8 years ago by bonefish

Replying to taos:

Ideally one also fixing the off-by-one error in the last CopyCharsInto() call.

Not really sure???

It starts copying at index licencePart2 + 1, but copies haikuLicence.Length() - licencePart2 bytes, which is one too many. It doesn't really matter, since the method clamps the length anyway, but still no reason not to pass the correct value.

Changed 8 years ago by taos

Attachment: AboutSystem.cpp.patch added

Patch to fix display of MIT license link (compatible with current catkeys in alpha3-rc).

Changed 8 years ago by taos

Patch to fix display of MIT license link and introduces consistent spelling of license (needs new catkeys).

comment:17 in reply to:  16 Changed 8 years ago by taos

Replying to bonefish:

It starts copying at index licencePart2 + 1, but copies haikuLicence.Length() - licencePart2 bytes, which is one too many.

Thanks, I changed the patch files accordingly.

Changed 8 years ago by taos

Updated patch to fix display of MIT license link and to introduce consistent spelling of "license".

Changed 8 years ago by taos

Updated patch to fix display of MIT license link and to introduce consistent spelling of "license".

comment:18 Changed 8 years ago by humdinger

Resolution: fixed
Status: newclosed

Applied with hrev42514. Closing as fixed. We'll see if it finally nailed the bugger when the translated catkeys arrive. Thanks a lot.

Note: See TracTickets for help on using tickets.