Opened 18 years ago
Last modified 10 years ago
#976 assigned bug
convert_from_utf8() broken
Reported by: | axeld | Owned by: | pulkomandy |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Kits/libtextencoding.so | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
convert_from_utf8() uses iconv() in the background. However, it opens/closes the iconv context with every invocation, which is just wrong.
It becomes obvious with the UTF-16 encoding, where a marker is added at the beginning of the file, or, in our case, for every call to convert_from_utf8().
It is not possible to open/close the context properly without changing the API. Since this conflicts with the goal of Haiku R1, this bug is delayed to post-R1 schedule. Likely, a new API will need to be designed as part of the locale kit. It may be done before R2, and coexist with the legacy one.
Change History (15)
comment:1 by , 18 years ago
Component: | - General → Kits/libtextencoding.so |
---|
comment:2 by , 18 years ago
Well, guess what, it didn't exist when I wrote that bug report :-) And I needed to search it quite a while until I really knew that, I must have forgotten updating the bug afterwards ;-)
follow-up: 4 comment:3 by , 18 years ago
Would it work if we used a static variable (TLS ?) to keep the initialized state of iconv ? Thus we'd just call iconv_open() the first time convert_encoding() is called...
comment:4 by , 18 years ago
Replying to jackburton:
Would it work if we used a static variable (TLS ?) to keep the initialized state of iconv ? Thus we'd just call iconv_open() the first time convert_encoding() is called...
Hmmm problem would be, thought, that we wouldn't know when to call iconv_close(). Ok, let go what I just said.
comment:5 by , 18 years ago
An API change seems to be unavoidable. convert_encoding_open() convert_encoding_close() More or less the way iconv work :/
comment:6 by , 18 years ago
We can't change the API in such a way if we want to stay binary compatible, though.
I think we could work out something with TLS and the "state" parameter, even though it wouldn't be that clean: we could only lazily close the iconv context in case a) a new translation is started (with "state" = 0), or if b) the thread exits.
Creating a new iconv context for every call to convert_{from|to}_utf8() is also very slow.
comment:7 by , 18 years ago
Sounds good. But IMO if it is too hacky for us, we could (should?) add some functions or a class which clearly open and close the conversion stream.
comment:8 by , 18 years ago
Hmm and what about apps that do more than 1 convertion at the same time in the same thread, with different state vars ? like, it's converting to UTF-8, and from it to something else... interleaving calls. I know it's lame but possible. What about a (thread-safe = locked) list (hash table ?) of state pointers ? Each time a new pointer is seen we allocate an iconv context. We wouldn't really know when to close the context though, but it would be more correct.
comment:9 by , 15 years ago
Owner: | changed from | to
---|---|
Version: | R1/pre-alpha1 → R1/Development |
comment:10 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:11 by , 14 years ago
Description: | modified (diff) |
---|---|
Milestone: | R1 → Unscheduled |
comment:12 by , 14 years ago
Indeed it's still a problem it seems... We can't put a pointer to a conv_struct unless we allow leaking them.
One option is to keep opening/closing the context on each call, but copy back the istate/ostate member of the struct conv_struct that is returned by iconv_open(), depending on if encoding to or from a multibyte format. It seems to be the only member that actually maintains state, the rest being constant for each invocation and only depending on the to and from encoding setting. Unless istate and ostate are used at the same time... but probably not with those two calls.
comment:13 by , 14 years ago
It seems since we convert to and from UTF-8 and iconv internally uses UCS-4 we will probably be using both istate and ostate in this usage pattern though :-(
comment:15 by , 10 years ago
ICU has the same "problem" as iconv: it needs explicit opening and closing of the context. As mentioned in the previous comments, stateless conversion is not possible, and a new API will be needed as the current one doesn't allow deleting the state when done. The use of an int32 as a state cookie is also not helpful, as we can't store a pointer to the state there, at least on 64-bit platforms.
So, we should introduce a separate API to handle this in a more correct way.
So, you complain about components, but then don't even use them? :)