Opened 14 years ago
Closed 12 years ago
#7423 closed bug (fixed)
UTF-8 multibyte characters not handled by UTF8Char::IsSpace()
Reported by: | scgtrp | Owned by: | siarzhuk |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Applications/Terminal | Version: | R1/Development |
Keywords: | gsoc2011 | Cc: | siarzhuk |
Blocked By: | Blocking: | ||
Platform: | All |
Description
Terminal's UTF8Char::IsSpace() does not properly handle multibyte characters. It was suggested on IRC that I open a ticket here and attach my patch to it.
Attachments (3)
Change History (16)
by , 14 years ago
Attachment: | terminal-utf8-spaces.patch added |
---|
comment:1 by , 14 years ago
patch: | 0 → 1 |
---|
comment:2 by , 14 years ago
Keywords: | gsoc2011 added |
---|
comment:3 by , 14 years ago
by , 14 years ago
Attachment: | terminal-utf8-spaces-v2.patch added |
---|
Same patch as before, fixed formatting.
comment:4 by , 14 years ago
What's your source for the ranges you used?
I looked at FreeBSD And here are the space class elements: 0x0085 0x00a0 0x1680 0x2000-0x200b 0x2028 0x2029 0x202f 0x205f 0x3000
comment:5 by , 14 years ago
I can't actually remember where I got that list - I want to say Wikipedia but I can't find the page I'm thinking of. Yours seems to be more complete. Will update the patch now.
comment:6 by , 14 years ago
Besides that you screwed with the coding style (removing and adding blank lines where you shouldn't have), there's a public BUnicodeChar class in the locale kit that already provides character classification and that should be used here.
follow-up: 8 comment:7 by , 14 years ago
bonefish: The style guidelines say to "Put exactly two blank lines between functions"; I was advised on IRC to adjust other style problems in files I'm working on. Should I not have done that?
I was unaware of BUnicodeChar's existence - was it not yet implemented when this was written or something?
comment:8 by , 14 years ago
Replying to scgtrp:
bonefish: The style guidelines say to "Put exactly two blank lines between functions"; I was advised on IRC to adjust other style problems in files I'm working on. Should I not have done that?
The coding guidelines don't explicitly cover methods implemented inline in class definitions. In those cases only one blank line should be used and there's also no line break before the method name.
I was unaware of BUnicodeChar's existence - was it not yet implemented when this was written or something?
Indeed, the code predates the locale kit.
comment:9 by , 13 years ago
patch: | 1 → 0 |
---|
comment:10 by , 13 years ago
I'm removing the flag as the patch shouldn't be applied in this state. As per Ingo's comment the BUnicodeChar should be used for the classification instead.
comment:11 by , 13 years ago
Cc: | added |
---|
comment:12 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:13 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed in hrev45379. Please check. Thank you for the pointing out the problem and possible solution.
Hi scgtrp, nice patch :) I'm not going to comment on the unicode portions of it (the majority of the patch, I know), because I'm not knowledgeable in that area. Disregarding the functional aspects, this patch looks good, it follows the style guide fairly well, although I did notice style violations on lines 83 and 94 of UTF8Char.h; the '}' and the "else {"/"else if {" should be on the same line.