Opened 9 years ago

Closed 8 years ago

#6276 closed bug (fixed)

Console backspace doesn't properly handle non-ascii unicode

Reported by: Adek336 Owned by: zooey
Priority: normal Milestone: R1
Component: System/POSIX Version: R1/Development
Keywords: terminal bash readline Cc: siarzhuk, henrimoi@…
Blocked By: Blocking: #5775, #6094, #6836, #7244
Has a Patch: no Platform: All

Description

A testcase

mkdir xyz
cd xyz
# fine
cd ..
cd Ź<backspace>xyz
# sh: cd: yz: No such file or directory

The examplary 'Ź' character does not have to be typed-in from keyboard, it may very well be copied-pasted.

hrev37286, gcc4+2 hybrid, VBox 3.2

Change History (17)

comment:1 by HaikuBot, 9 years ago

Yes. I have same when forgot switch keymap from russian and type something in terminal and then try use backspace key and get something like Adek336 have.

comment:2 by bonefish, 9 years ago

This is a duplicate of #5775, but it has a better summary and description. So closing that one.

comment:3 by bonefish, 9 years ago

Blocking: 5775 added

(In #5775) Closing as duplicate of #6276, which has the better description and summary.

comment:4 by siarzhuk, 9 years ago

Cc: siarzhuk added

comment:5 by heto, 9 years ago

Cc: henrimoi@… added

If you try to set LC_CTYPE or any of the other locale variables, bash will warn that setlocale fails. This is probably because Haiku doesn’t ship with any glibc locale files, so applications won’t even know what character encoding is associated with a locale. On Linux (Debian squeeze), these files seem to be installed under /usr/share/i18n.

This also means that any application that relies on C/POSIX locale support for finding out the character encoding will fail, including applications such as svn. And because the locale files are not installed, setting LC_CTYPE will not mean anything.

in reply to:  5 comment:6 by anevilyak, 9 years ago

Replying to heto:

If you try to set LC_CTYPE or any of the other locale variables, bash will warn that setlocale fails.

Actually, it's because Haiku doesn't currently implement any of the POSIX locale stuff properly. There is a work-in-progress branch to rectify this, but until it's complete, none of the LC_* stuff will work correctly, regardless of the presence of those files.

comment:7 by bonefish, 9 years ago

Blocking: 6094 added

comment:8 by bonefish, 9 years ago

Blocking: 6836 added

(In #6836) Same bug as #6276. Some readline or bash issue I'd say.

comment:9 by bonefish, 9 years ago

Component: - GeneralApplications/Command Line Tools
Version: R1/alpha2R1/Development

comment:10 by bonefish, 9 years ago

Owner: changed from nobody to bonefish
Status: newin-progress

After a first glance it looks very much like a bash/readline problem. Looking a bit closer...

comment:11 by bonefish, 9 years ago

Component: Applications/Command Line ToolsSystem/POSIX
Owner: changed from bonefish to zooey
Status: in-progressassigned

Passing on to Oliver. This is a POSIX locale related issue. Haiku's <stdlib.h> defines MB_CUR_MAX to 1. It is also noteworthy that <limits.h> doesn't define MB_LEN_MAX, so in gcc's fixed header it is defined to 1 as well.

comment:12 by bonefish, 9 years ago

Keywords: terminal bash readline added

comment:13 by bonefish, 9 years ago

Blocking: 7244 added

(In #7244) Duplicate of #6276.

comment:14 by pulkomandy, 8 years ago

Is setting both MB_LEN_MAX and MB_CUR_MAX to 6 a correct fix (seems to be the value needed for utf-8) ? Or is there something more involved ?

MB_LEN_MAX must be a constant so we should use 6 for it, MB_CUR_MAX may be variable depending on the current locale but I'm not sure that's of any use.

in reply to:  14 comment:15 by bonefish, 8 years ago

Replying to pulkomandy:

Is setting both MB_LEN_MAX and MB_CUR_MAX to 6 a correct fix (seems to be the value needed for utf-8) ? Or is there something more involved ?

It is probably more involved. Possibly also including changes to the compiler.

To represent any Unicode code point 4 byte UTF-8 suffices, BTW. glibc seems to define MB_LEN_MAX to 16 (sixteen).

comment:16 by zooey, 8 years ago

With hrev43310, the behaviour has improved with respect to the handling of backspace, but entering the multibyte characters manually still causes problems - the character shows only after pressing space for instance). Pasting multibyte characters and deleting them via backspace works ok, though.

I'll investigate why editing doesn't work ..

comment:17 by zooey, 8 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev45307.

Note: See TracTickets for help on using tickets.