#12515 closed bug (fixed)
can't build Haiku within haiku.
Reported by: | khallebal | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | - General | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #13103 | |
Platform: | x86 |
Description
Hi I tried to build Haiku x86 within Haiku x86 4 or 5 times this last month and i kept stumbling against the following same error msg.
gawk: src/apps/devices/pci-header.awk:169: fatal error: internal error
Attachments (2)
Change History (32)
comment:1 by , 9 years ago
by , 9 years ago
Attachment: | pci-header.awk-workaround.patch added |
---|
Workaround for an error in awk-4.1.3
comment:2 by , 9 years ago
patch: | 0 → 1 |
---|
by , 9 years ago
Attachment: | usb_devlist2h.awk-workaround.patch added |
---|
After applying this second workaround the system compiles correctly.
comment:3 by , 9 years ago
For the first patch -- won't the string be printed 3 times if all 3 cases are true? Shouldn't it be "else if" instead?
comment:5 by , 9 years ago
This script generate the same file on Linux witout this patch and with this patch on Haiku.
comment:6 by , 9 years ago
This solution is optimal. I am working on patch for awk. I found that the problem is in function nextc (next character).
comment:7 by , 9 years ago
The nextc function looks like good, unfortunately second call to mbrlen on sequence '&&' or 'II' generates internal error. This error exist only in AWK, standalone program with this code works fine.
comment:8 by , 9 years ago
I changed the function mbrlen to strnlen and AWK works fine, but this isn't good option.
comment:10 by , 9 years ago
Our mbrlen is here: http://cgit.haiku-os.org/haiku/tree/src/system/libroot/posix/wchar/mbrlen.c
We had similar issues with grep and coreutils in the past, IIRC the problem was that they tried to override part of the widechar function, but we did not have all appropriate weak symbols, so they ended up mixing their and our implementation.
comment:11 by , 9 years ago
I found satisfaing solution. Just before each call mbrlen we reset the converter field in mbstatus record. This is very interesting because value in this field is constant lifelong program.
tmp_state.converter = NULL; mbrlen(lexptr, idx+1, &tmp_state);
comment:12 by , 9 years ago
According to POSIX specs for mbrlen (http://pubs.opengroup.org/onlinepubs/9699919799/functions/mbrlen.html):
If ps is a null pointer, the mbrlen() function shall use its own internal mbstate_t object, which is initialized at program start-up to the initial conversion state.
Our implementation seems to do this properly (the state is declared static, so it will be cleared to 0 at program start, then each call to mbrlen reuses the state from previous calls).
It's strange that this can fail on plain ASCII text, however. In that case there are no multi-byte characters so the state shouldn't be desynchronized.
comment:13 by , 9 years ago
Yes, this is very strange. Status buffer in AWK also is static, but is copied into local buffer before call to mbrlen, into local variable tmp_status, and restored after that.
comment:14 by , 9 years ago
And another question, why this problem exist for double && or \|\| sequence?
comment:15 by , 9 years ago
How do they copy the status buffer? Our implementation (http://cgit.haiku-os.org/haiku/tree/headers/posix/wchar.h) of mbstate_t holds a pointer, and I don't know if it is safe to reuse the pointer from different states.
comment:16 by , 9 years ago
AWK code sequence is as follows:
static mbstate_t cur_mbstate; ... static int nextc(...) { ... mbstate_t tmp_mbstate; ... for (idx = 0, ....) { ... tmp_state = cur_mbstate; ... mbrlen(lexptr, idx + 1, &tmp_state); ... } cur_mbstate = tmp_state; }
This is weird for me.
follow-up: 20 comment:18 by , 8 years ago
I suddenly notice that you mentionned it specifically happen with \ and &. I think it could be trying to handle é and \123 style escape sequences?
comment:19 by , 8 years ago
Blocking: | 13103 added |
---|
(In #13103) Indeed, sorry I missed that one. Odd that it's compiler-specific though.
comment:20 by , 8 years ago
Replying to pulkomandy:
I suddenly notice that you mentionned it specifically happen with \ and &. I think it could be trying to handle é and \123 style escape sequences?
This is not an elegant solution. It requires the substitution in awk scripts.
follow-up: 22 comment:21 by , 8 years ago
Anyone tried to build an image with a gcc5h system? i'm askinng this because i suspect that the haiku build system makes use of the the gcc2 version of awk when building, that's why it doesn't fail with a gcc2h system.
comment:22 by , 8 years ago
Replying to khallebal:
Anyone tried to build an image with a gcc5h system? i'm askinng this because i suspect that the haiku build system makes use of the the gcc2 version of awk when building, that's why it doesn't fail with a gcc2h system.
No, there is no gcc2 package of awk on a gcc5h system. This problem occurs on any version of gawk built with gcc5, including x86-64, FWIW.
comment:23 by , 8 years ago
Using the testcase of gawk '{ 0 || 1}'
, I've narrowed it down to https://github.com/haiku/haiku/blob/master/src/system/libroot/add-ons/icu/ICUCtypeData.cpp#L555
This is invoked by mbrlen, called at: gawk-4.1.4/awkgram.y:2864. Debugger doesn't seem to show global variables, which is a bit annoying, so can't really see the whole state, such as the value of lexptr =/
comment:24 by , 8 years ago
Well, that would be a problem: as you can see in comment:16, gawk is copying mbstate_t with operator=. Clearly the code here does not handle that (the copied state can still point to the freed unicode converter, for example).
So, either some spec (POSIX or the like) says that mbstate_t should be trivially copiable, and we should fix or implementation, or, it is gawk fault for doing not allowed things.
comment:25 by , 8 years ago
FWIW https://www.gnu.org/software/libc/manual/html_node/Converting-a-Character.html also shows a use of a memcpy() with mbstate_t:
if (len < MB_CUR_LEN) { mbstate_t temp_state; memcpy (&temp_state, &state, sizeof (state)); if (wcrtomb (NULL, *ws, &temp_state) > len) { /* We cannot guarantee that the next character fits into the buffer, so return an error. */ errno = E2BIG; return NULL; } }
comment:26 by , 8 years ago
Either way, the commit in 079ab7f0b101bc237633490cb8964f7991bedf6a should still be valid, as it was clearly trying to close an invalid handle.
Is this enough to consider closed now? Perhaps a bug needs to be opened for gawk to fix their incorrect handling of copying the mbstate_t?
comment:27 by , 8 years ago
Yes, I think that part we should ask gawk about. Maybe they can show us a proof that we need to support it, otherwise, they have to fix their code (or try to fix the POSIX spec to force their expected behavior :D)
comment:28 by , 8 years ago
Hmm, a quick test with doing a trivial copy of a struct, even with -Wall -Werror, gcc doesn't complain about copying a struct using the format that gawk is using, and has the expected output of the copy being identical.
#include <stdio.h> #include <string.h> typedef struct s { int a; int b; int c; int d; int e; int f; } s; int main(void) { s current; current.a = 42; current.b = 55; current.c = 7; current.d = -5; current.e = 63; current.f = -402; s tmp = current; printf("%d %d %d %d %d %d\n", tmp.a, tmp.b, tmp.c, tmp.d, tmp.e, tmp.f); printf("equal = %s\n", memcmp(¤t, &tmp, sizeof(s)) == 0 ? "yes" : "no"); return 0; }
d$ ./test 42 55 7 -5 63 -402 equal = yes
comment:29 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:30 by , 8 years ago
Yes, the pattern that would fails is more complex. Copying the struct itself is fine. The problem is if your struct is such made:
struct s2 { int i; }; struct s { struct s2* realContext; };
Now let's imaging a function that does some processing:
void doStuff(struct s* context) { context->realContext->i++; }
If you try to save the context:
struct s context; struct s tmp = context; // backup copy, so we can restore if things go wrong doStuff(context); // Something went wrong! Restore the context! context = tmp;
Here, the value of "i" will not be restored.
Confirmed. Building gcc2h on gcc2h works fine.