Ticket #976 (new bug)

Opened 23 months ago

Last modified 21 months ago

convert_from_utf8() broken

Reported by: axeld Owned by: axeld
Priority: normal Milestone: R1
Component: Kits/libtextencoding.so Version: R1 development
Cc: Blocked By:
Platform: All Blocking:

Description

convert_from_utf8() uses iconv() in the background. However, it opens/closes the iconv context with every invocation, which is just wrong.

It becomes obvious with the UTF-16 encoding, where a marker is added at the beginning of the file, or, in our case, for every call to convert_from_utf8().

Change History

  Changed 23 months ago by wkornewald

  • component changed from - General to Kits/libtextencoding.so

So, you complain about components, but then don't even use them? :)

  Changed 23 months ago by axeld

Well, guess what, it didn't exist when I wrote that bug report :-) And I needed to search it quite a while until I really knew that, I must have forgotten updating the bug afterwards ;-)

follow-up: ↓ 4   Changed 21 months ago by jackburton

Would it work if we used a static variable (TLS ?) to keep the initialized state of iconv ? Thus we'd just call iconv_open() the first time convert_encoding() is called...

in reply to: ↑ 3   Changed 21 months ago by jackburton

Replying to jackburton:

Would it work if we used a static variable (TLS ?) to keep the initialized state of iconv ? Thus we'd just call iconv_open() the first time convert_encoding() is called...

Hmmm problem would be, thought, that we wouldn't know when to call iconv_close(). Ok, let go what I just said.

  Changed 21 months ago by korli

An API change seems to be unavoidable. convert_encoding_open() convert_encoding_close() More or less the way iconv work :/

  Changed 21 months ago by axeld

We can't change the API in such a way if we want to stay binary compatible, though.

I think we could work out something with TLS and the "state" parameter, even though it wouldn't be that clean: we could only lazily close the iconv context in case a) a new translation is started (with "state" = 0), or if b) the thread exits.

Creating a new iconv context for every call to convert_{from|to}_utf8() is also very slow.

  Changed 21 months ago by korli

Sounds good. But IMO if it is too hacky for us, we could (should?) add some functions or a class which clearly open and close the conversion stream.

  Changed 21 months ago by mmu_man

Hmm and what about apps that do more than 1 convertion at the same time in the same thread, with different state vars ? like, it's converting to UTF-8, and from it to something else... interleaving calls. I know it's lame but possible. What about a (thread-safe = locked) list (hash table ?) of state pointers ? Each time a new pointer is seen we allocate an iconv context. We wouldn't really know when to close the context though, but it would be more correct.

Note: See TracTickets for help on using tickets.