Opened 14 years ago

Closed 12 years ago

#6227 closed bug (fixed)

Terminal TermParse::EscParse() 2 issues.

Reported by: h.z Owned by: siarzhuk
Priority: normal Milestone: R1
Component: Applications/Terminal Version: R1/alpha2
Keywords: Cc: siarzhuk
Blocked By: Blocking:
Platform: All

Description

1.Insert UTF-8 3 bytes char with width = 1 cause wrong buffer attributes cause BView::DrawString abnormally.

2.The convert_to_utf8() always return with zero dstLen, then the following line of InsertChar() with do nothing.

The changes on TermParse::EscParse() need more testing.

Attachments (1)

terminal-6227.patch (1.9 KB ) - added by h.z 14 years ago.
fix for these 2 issues, tested under UTF-8/GBK conversions

Download all attachments as: .zip

Change History (14)

comment:1 by stippi, 14 years ago

Have you been meaning to attach a patch? The wording of your last sentence suggests so, but I am not sure. :-)

comment:2 by h.z, 14 years ago

I notice new class methods are used now, so I think need some time to do some debugging to see what happens after these changes.

One quick fix for issue 1 just like this, working good at UTF-8 conversion but its not a good way, I think. The problems should come from TermParse::EscParse().

---BasicTerminalBuffer.cpp--- void BasicTerminalBuffer::InsertChar(UTF8Char c, uint32 width, uint32 attributes) { debug_printf("BasicTerminalBuffer::InsertChar('%.*s' (%d), %#lx)\n", (int)c.ByteCount(), c.bytes, c.bytes[0], attributes);

if ((int32)width == FULL_WIDTH)

attributes |= A_WIDTH;

+ if (c.IsFullWidth()) +{ + attributes |= A_WIDTH; + width *= 2; +}

by h.z, 14 years ago

Attachment: terminal-6227.patch added

fix for these 2 issues, tested under UTF-8/GBK conversions

comment:3 by h.z, 14 years ago

patch: 01

comment:5 by h.z, 14 years ago

issue 1: utf-8 fullwidth with width=1 fix: add if IsFullWidth then width=2.

issue 2: convert_to_utf8 always with dstLen=0 fix: add dstLen=4 and.

comment:6 by jackburton, 14 years ago

First of all: sorry to reply this late, and thanks for working on this! I tried to apply the patch but doesn't applies cleanly. I'm applying it manually. Only, I can't understand some of the comparison in TermParse.cpp:

if ((c > 0xA1 && c < 0xA9)

(c>0xB0 && c< 0xF7)
(c>0x81 && c< 0xA0)
(c>0xAA && c< 0xFE)
(c>0xA8 && c< 0xA9))

Besides that the comparisons would be more readable if they were done in order (first 0x81, then 0xA1, etc), the last case (c>0xA8 && c< 0xA9) seems to be covered already by the first check (c > 0xA1 && c < 0xA9), and there are other checks which overlap (c>0xB0 && c< 0xF7) vs (c>0xAA && c< 0xFE) so I'm not sure if I understand this. Then there is some coding style violations (parenthesis go to the end of the line, missing spaces between "c" and ">", etc.)

in reply to:  6 comment:7 by bonefish, 14 years ago

Replying to jackburton:

the last case (c>0xA8 && c< 0xA9) seems to be covered already by the first check (c > 0xA1 && c < 0xA9)

In fact, since c is an integer, the last check is always false.

comment:8 by mmlr, 13 years ago

patch: 10

comment:9 by mmlr, 13 years ago

The patch doesn't apply cleanly and more feedback is needed. Can you please update the patch to a current revision and fix the mentioned coding style problems and make it.

Version 0, edited 13 years ago by mmlr (next)

comment:10 by siarzhuk, 13 years ago

Cc: siarzhuk added

comment:11 by siarzhuk, 13 years ago

First issue (invalid destLen was sent to convert_to_utf8) fixed in hrev43252.

Some comments about the second one:

1) UTF8.h says that:

41 B_GBK_CONVERSION, // Chinese GB18030

But GB 18030 is an extension of GBK. They are different encodings! Are there anybody flyent in iconv's architecture to check which encoding is used behind the scenes of our convert_xx_utf8?

2) Those ranges in the patch looks like come from the following info (root/src/libs/iconv/gb18030.h):

* GB18030, as specified in the GB18030 standard, is an extension of GBK.
* In what follows, page numbers refer to the GB18030 standard (second
* printing).
* It consists of the following parts:
*
* One-byte range:
*   ASCII        p. 2         0x{00..7F}
*
* Two-byte range:
*   GBK part 1   p. 10..12    0x{A1..A9}{A1..FE}
*   GBK part 2   p. 13..36    0x{B0..F7}{A1..FE}
*   GBK part 3   p. 37..52    0x{81..A0}{40..7E,80..FE}
*   GBK part 4   p. 53..81    0x{AA..FE}{40..7E,80..A0}
*   GBK part 5   p. 82        0x{A8..A9}{40..7E,80..A0}
*   UDA part 1   p. 83..84    0x{AA..AF}{A1..FE}              U+E000..U+E233
*   UDA part 2   p. 85..87    0x{F8..FE}{A1..FE}              U+E234..U+E4C5
*   UDA part 3   p. 88..90    0x{A1..A7}{40..7E,80..A0}       U+E4C6..U+E765

So things are looking more complicate that a simple fallback to CASE_PRINT_GR anyway. ;-)

comment:12 by siarzhuk, 12 years ago

Owner: changed from jackburton to siarzhuk
Status: newassigned

comment:13 by siarzhuk, 12 years ago

JFYI: East Asian Full-Width characters support fixed in hrev45462. Issue with GBK encoding should be investigated additionally.

comment:14 by siarzhuk, 12 years ago

Resolution: fixed
Status: assignedclosed

Should be finally resolved in hrev45533.

Note: See TracTickets for help on using tickets.