Opened 3 weeks ago

Last modified 2 weeks ago

#19216 new task

Change userguide language code for Chinese

Reported by: humdinger Owned by: waddlesplash
Priority: normal Milestone: Unscheduled
Component: Website/Userguide Translator Version: R1/beta5
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

As there's currently some activity for Chinese translations, I wonder if we can change the language code for Chinese.

Right now we use "zh_CN" for Chinese, but this has some problems:

  • this is a country code, not a language code
  • currently our "zh_CN" user guide is in "Chinese simplfied". There interest to translate to "Chinese traditional"

Often "Chinese traditional" seems to use "zh_TW", but as we should move away from country codes, this wouldn't be a preferred solution.

Pootle already uses "zh_HANS" and "zh_HANT". Can the userguide (and Welcome package) do likewise?

Change History (7)

comment:1 by humdinger, 3 weeks ago

I've just had a look at this stackoverflow page. It appears like it's not just a matter of different character - zh_HANS vs. zh_HANT, but that translations differ between regions. For example, there could be zh_HANS_HK and zh_HANT_HK for the Chinese variant that's spoken in Hong Kong...

Maybe we need some more expert input from out Chinese users in the forum.

comment:2 by pulkomandy, 2 weeks ago

There isn't really a problem with country codes here.

For example, this is what we do for portuguese as well, there is pt_PT (for Portugal) and pt_BR (for Brazil). These are two languages named "portuguese" but they have diverged quite far from each other.

The situation is similar for Chinese, but, there is an additional complication due to political history. In the cae of portuguese, it's easy to say that Portugal portuguese is the "original" one and the brazilian variant derived from it (it's not entirely true, but it's acceptable). So, pt_PT is in our case simply "pt".

But for Chinese, this is not so simple, because Taiwan says they are the continuation of the original china. This results in various strange things, for example in the olympics they somehow ended up participating as "chinese taipei".

In the case of language codes, this led to the decision to use "traditional chinese" and "simplified chinese" (zh_Hant and zh_Hans) as language codes that do not mention any country name. Note that this is purely a language code, and, like any other language code, you can additionally add a country suffix. This can be useful if a country happens to use both variants of the language. Another example of this would be Serbia, which uses both latin and cyrillic alphabets for the same language. We currently have those in Pootle as sr and sr_Latn. It seems this could be the case in Hong Kong for Chinese, but maybe they can just use the generic language code without country in our case, because the other settings (date format, currency unit, etc) are configured separately.

In the end, it is as usual with localization: trying to fit complicated historical, political, cultural things into a computer isn't easy.

In any case, for example in https://www.localeplanet.com/icu/iso639.html I don't see a possibility for zh-CN. You can do just zh, or zh_Hans/zh_Hant, and only the latter two optionally get a country suffix. I think the generic "zh" may exist only for spoken things, where the writing system doesn't matter, but for text, you have to specify which writing system to use. It seems some people decided to "imply" the writing system based on the country, but as we have seen, this wouldn't work for Serbian and other languages where two writing systems can co-exist in the same country. So, we'd better do the right thing :)

comment:3 by nephele, 2 weeks ago

Often "Chinese traditional" seems to use "zh_TW", but as we should move away from country codes, this wouldn't be a preferred solution.

I don't get this. why?

And honestly, I think the whole distinction is a bit ridicilous. Of course we *can* support severall variants of chinese if we want to. Heck we even support variants of some languages that are spoken way less than any variant of chinese.

Anyhow, the distinction for simple vs traditional characters is not something you can answer in isolation because this is a nom-phonetic alphabet. Just like you can't group european languages based on "what version of latin do you use to write?" We don't write in latin but a derived alphabet.

Anyway I would do it like this: Have a variant for chinese mailand and chinese taiwan (like the country codes linux uses) and leave the question of which alphabet to use to the translators. If they wish to do the work to provide this in both then we should let them provide it in both too. But you can't map these variants to one area or another because there are more differences ontop of that. (even for english uk vs english US for example)

comment:4 by pulkomandy, 2 weeks ago

Anyway I would do it like this

I don't think we need opinions here. We should follow the practices of either ICU, or what other systems do, and not invent our own thing. Especially if we do so without input from the people who actually use the Chinese language. This isn't really for us to decide.

I have provided context on what we did previously, if people from China and Taiwan say it is not the best solution, we should listen to them.

comment:5 by nephele, 2 weeks ago

Using it like we did before is already consistent with linux. So what would you change then? I don't see what the initial problem of this ticket is.

If some translator for any of these variants would like a different additional translation we can do that, but other than that I don't see what you want done instead.

comment:6 by humdinger, 2 weeks ago

FWIW, the forum thread didn't help me that much so far... :)

I don't see what the initial problem of this ticket is.

I just noticed that we use zh-HANS and zh-HANT at Pootle for the interface translation and zh-CN for the userguide. I was wondering, why both translations don't use the same codes and if they should, which one were the correct one.

in reply to:  6 comment:7 by MichaelPeppers, 2 weeks ago

Replying to humdinger:

FWIW, the forum thread didn't help me that much so far... :)

Then please speak up about it in the forum and tell us what isn't clear about the stuff said there. Chinese users as well as people like me who know or studied somehow related writing systems might be so used to dealing with the characters we might be glossing over stuff that's very obvious to us.

(the forum input, especially the one from Chinese users, has provided most of the info you would ever need about it. If it's hard to read/understand keep in mind that Chinese and English are worlds apart so it's as hard for them to make their point as it is for English users to *read* what their point's about)

Replying to nephele:

Using it like we did before is already consistent with linux.

I'm sorry, but the Chinese users in the forums, the ones who actually use those locales, clearly disagree, otherwise they wouldn't be saying Nano is not being localized. (They're talking about locale while you guys are probably focusing more on the userguide side of things, but still)

Replying to nephele:

Anyhow, the distinction for simple vs traditional characters is not something you can answer in isolation because this is a nom-phonetic alphabet. Just like you can't group european languages based on "what version of latin do you use to write?" We don't write in latin but a derived alphabet.

Misleading statement, in practice it's quite the opposite. Comparisons with alphabets do not apply, these writing systems are not alphabets and are much more consistent with each other than the various derivatives of latin alphabets are when it comes to what means what. Most words that are actually different between the two are new or loaned words from like the 50's to now, which could *still* be understood, albeit with some trouble, by the other side, as the characters get interpreted by the reader. These different words are few and far between as well. No language in Europe uses latin alphabet directly but virtually all Chinese people use either simplified or traditional, there's no other standards.

(Because making/enforcing more versions of characters for variants that only diverge when spoken would be a massive waste of time, money and all in all a literal nightmare. Just think about codifying your own standard for about 200k+ characters, and to me that's a low estimate.)

I'll give you a more concrete example: imagine we're all cavemen of a specific tribe and one of us makes a wall graffiti about people chucking spears at a pack of deer. When you observe the graffiti it doesn't matter if you say "deer-hunting" in a different way than me, we both get that it's about "deer-hunting", because we all draw these things in a similar style. Then a caveman from another tribe comes in and in their tribe they're used to draw deer, people or spears in a slightly different way than us, that caveman might struggle understanding the picture at first but eventually they'd get it. Again, it doesn't really matter how they say the word, what matters is whether they recognize the symbols and their meaning or not.

Last edited 2 weeks ago by MichaelPeppers (previous) (diff)
Note: See TracTickets for help on using tickets.