Opened 8 years ago

Closed 6 years ago

#7423 closed bug (fixed)

UTF-8 multibyte characters not handled by UTF8Char::IsSpace()

Reported by: scgtrp Owned by: siarzhuk
Priority: normal Milestone: R1
Component: Applications/Terminal Version: R1/Development
Keywords: gsoc2011 Cc: siarzhuk
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Terminal's UTF8Char::IsSpace() does not properly handle multibyte characters. It was suggested on IRC that I open a ticket here and attach my patch to it.

Attachments (3)

terminal-utf8-spaces.patch (2.4 KB) - added by scgtrp 8 years ago.
terminal-utf8-spaces-v2.patch (2.4 KB) - added by scgtrp 8 years ago.
Same patch as before, fixed formatting.
terminal-utf8-spaces-v3.patch (3.3 KB) - added by scgtrp 8 years ago.
Now with even more spaces!

Download all attachments as: .zip

Change History (16)

Changed 8 years ago by scgtrp

Attachment: terminal-utf8-spaces.patch added

comment:1 Changed 8 years ago by scgtrp

Has a Patch: set

comment:2 Changed 8 years ago by scgtrp

Keywords: gsoc2011 added

comment:3 Changed 8 years ago by yourpalal

Hi scgtrp, nice patch :) I'm not going to comment on the unicode portions of it (the majority of the patch, I know), because I'm not knowledgeable in that area. Disregarding the functional aspects, this patch looks good, it follows the style guide fairly well, although I did notice style violations on lines 83 and 94 of UTF8Char.h; the '}' and the "else {"/"else if {" should be on the same line.

Changed 8 years ago by scgtrp

Same patch as before, fixed formatting.

comment:4 Changed 8 years ago by korli

What's your source for the ranges you used?

I looked at FreeBSD And here are the space class elements: 0x0085 0x00a0 0x1680 0x2000-0x200b 0x2028 0x2029 0x202f 0x205f 0x3000

comment:5 Changed 8 years ago by scgtrp

I can't actually remember where I got that list - I want to say Wikipedia but I can't find the page I'm thinking of. Yours seems to be more complete. Will update the patch now.

Changed 8 years ago by scgtrp

Now with even more spaces!

comment:6 Changed 8 years ago by bonefish

Besides that you screwed with the coding style (removing and adding blank lines where you shouldn't have), there's a public BUnicodeChar class in the locale kit that already provides character classification and that should be used here.

comment:7 Changed 8 years ago by scgtrp

bonefish: The style guidelines say to "Put exactly two blank lines between functions"; I was advised on IRC to adjust other style problems in files I'm working on. Should I not have done that?

I was unaware of BUnicodeChar's existence - was it not yet implemented when this was written or something?

comment:8 in reply to:  7 Changed 8 years ago by bonefish

Replying to scgtrp:

bonefish: The style guidelines say to "Put exactly two blank lines between functions"; I was advised on IRC to adjust other style problems in files I'm working on. Should I not have done that?

The coding guidelines don't explicitly cover methods implemented inline in class definitions. In those cases only one blank line should be used and there's also no line break before the method name.

I was unaware of BUnicodeChar's existence - was it not yet implemented when this was written or something?

Indeed, the code predates the locale kit.

comment:9 Changed 8 years ago by mmlr

Has a Patch: unset

comment:10 Changed 8 years ago by mmlr

I'm removing the flag as the patch shouldn't be applied in this state. As per Ingo's comment the BUnicodeChar should be used for the classification instead.

comment:11 Changed 8 years ago by siarzhuk

Cc: siarzhuk added

comment:12 Changed 6 years ago by siarzhuk

Owner: changed from jackburton to siarzhuk
Status: newassigned

comment:13 Changed 6 years ago by siarzhuk

Resolution: fixed
Status: assignedclosed

Fixed in hrev45379. Please check. Thank you for the pointing out the problem and possible solution.

Note: See TracTickets for help on using tickets.