#16898 closed bug (fixed)
WebKit rebased consistently crashes on Amazon.ca [WASM memory allocation error]
Reported by: | vidrep | Owned by: | pulkomandy |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Kits/Web Kit | Version: | R1/beta2 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #16957, #17219, #17260 | |
Platform: | x86-64 |
Description
hrev55026 x86_64
HaikuWebKit 1.7.0
WebKit 611.1.8
WebPositive consistently crashes after logging into my account and accessing the order history
Debugger report attached
Attachments (7)
Change History (39)
by , 4 years ago
Attachment: | WebPositive-1222-debug-04-04-2021-20-10-30.report added |
---|
comment:1 by , 3 years ago
Component: | Applications/WebPositive → Kits/Web Kit |
---|
comment:2 by , 3 years ago
Blocking: | 16957 added |
---|
comment:3 by , 3 years ago
Platform: | All → x86-64 |
---|
comment:4 by , 3 years ago
by , 3 years ago
Attachment: | WebPositive-2306-debug-10-07-2021-00-08-02.report added |
---|
comment:5 by , 3 years ago
I cannot reproduce the issue and I think it affects only the 64bit version. By looking at the stacktrace and the related code, I think there is an issue with mprotect failing and WebKit stops with a release assert in this case.
It would help to build WebKit with some extra logging:
- Edit Source/WTF/wtf/Datalog.cpp
- Change DATA_LOG_TO_FILE define to 1 instead of 0
- Compile WebKit
- Reproduce the crash with this version
It should create a file in /tmp/WTFLog* with hopefully a bit more information about what's happening.
comment:6 by , 3 years ago
Summary: | WebKit rebased consistently crashes on Amazon.ca → WebKit rebased consistently crashes on Amazon.ca [WASM memory allocation error] |
---|
by , 3 years ago
Attachment: | WebPositive-986-debug-16-07-2021-16-14-36.report added |
---|
by , 3 years ago
Attachment: | WTFLog.986.txt added |
---|
comment:8 by , 3 years ago
So, here is the involved mprotect calls:
- https://github.com/haiku/haikuwebkit/blob/haiku/Source/JavaScriptCore/wasm/WasmMemory.cpp#L408
- https://github.com/haiku/haikuwebkit/blob/haiku/Source/JavaScriptCore/wasm/WasmMemory.cpp#L444
(not sure if the first one is enabled in our case)
It seems mprotect is not happy with the passed parameters. There are several cases where it can return a "no memory" error. Probably WebKit is trying to use it in a way that we don't support yet.
I think the next step is to reproduce the crash while running with strace:
strace WebPositive > trace.txt
This will show all system calls, including the calls to mprotect. Maybe we can learn something more from that trace.
by , 3 years ago
Attachment: | terminal.txt added |
---|
by , 3 years ago
Attachment: | WTFLog.2411.txt added |
---|
by , 3 years ago
Attachment: | trace.txt.zip added |
---|
comment:12 by , 3 years ago
Analysis of the trace reveals that WebKit is calling mprotect on a 4GB segment of memory, and this fails. I am not sure how to investigate further as this happens in the WASM code, which is disabled in the 32bit version, so I can't reproduce the issue on my current setup.
comment:13 by , 3 years ago
I guess it wants a full 4GB virtual address space for the WASM runtime. Maybe it is somehow trying to reserve the memory and there are not 4GB available, instead of requesting overcommit?
comment:14 by , 3 years ago
This seems unlikely to me, as I have gotten that crash severall times on my home computer with 32GB of physical memory excluding swap, but with not nearly as much used. I can do quantitive tests if needed though.
comment:15 by , 3 years ago
The mprotect call is there to mark the memory as unreserved. Just before it there is a call to mmap to allocate the address range, as far as I can see, that one succeeds.
comment:16 by , 3 years ago
I don't know what I'm talking about, but given that https://review.haiku-os.org/c/haiku/+/4212 corrects a check that would return ENOMEM, could that be the problem?
comment:17 by , 3 years ago
I made this change after looking at this problem and looking at the mprotect implementation. I found one problem while looking at the code and fixed it, but I have no idea if it is related or not. Test it and let us know, I guess?
comment:18 by , 3 years ago
I think this might be a problem of haiku that it refuses to allocate bigger slices. with 31ish GiB of memory free:
~ dd if=/dev/zero of=file count=1 bs=1063307069 1+0 records in 1+0 records out 1063307069 bytes (1,1 GB, 1014 MiB) copied, 0,535022 s, 2,0 GB/s ~ dd if=/dev/zero of=file count=1 bs=1063307070 dd: memory exhausted by input buffer of size 1063307070 bytes (1014 MiB)
comment:19 by , 3 years ago
Perhaps this is related? https://git.haiku-os.org/haiku/tree/src/system/libroot/posix/malloc_hoard2/threadheap.cpp#n50
comment:20 by , 3 years ago
In src/system/libroot/posix/malloc_hoard2/config.h There is a #define MAX_INTERNAL_FRAGMENTATION 2, setting this to 6 does fix the dd issue... but not the Wasm allocation one. So seems unrelated after all. :/
(And i have no idea what that constant does... so setting this up may or may not be a good idea :D)
comment:21 by , 3 years ago
The crash here is not in malloc code. It's in webkit directly calling mmap to manage its own memory. So, no, this isn't related.
There is a known problem with the hoard2 allocator that it is based on a single memory area that we currently size at about 2GB. Whenever an application tries to malloc more than that, it doesn't work. We need to replace the memory allocator, but so far we have not found one that handles our use case well, especially on 32bit systems. Modern allocators tend to require a lot of address space there, and they also create a lot of smaller areas, when we would prefer a single larger one. But WebKit isn't subject to this, at least in this case.
If you want to investigate this, you rather need to look at the mprotect implementation and see why it is failing.
Here is the failing syscall:
[ 2411] set_memory_protection(0xc60265e000, 0xff800000, 0x0) = 0x80000000 Out of memory (2 us)
I think the area this is operating on is this one, created just before:
[ 2411] map_file("libWebKitLegacy.so.1 mmap area", 0x7fb77fd2fe48, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0xffffffff) = 0x486e (6 us)
It's a bit hard to be sure, because map_file returns an area_id, but doesn't say the address it mapped the area at. The set_memory_protection has a smaller size than the area, that corresponds to the "initialBytes" offset in the mprotect calls that I linked in an earlier comment.
The size of the memory part it's trying to unmap is quite large (about 4GB) but on a 64bit system I think this should not be a problem? The initial size of the area is also more than 4GB.
So, it should not be too hard to write a similar test case now that we have the sequence of operations:
1) call mmap to create a large area (size: 0x100800000, that's 4GiB + 8MiB) 2) call mprotect to unmap most of that area, keeping 16MiB allocated at the start
Confirm with strace that you get the same parameters as in the trace we have here (addresses and area IDs will be different, but try to get the flags as identical as possible).
Then we will have something that's a lot easier to investigate on. Hopefully it reproduces the problem.
comment:22 by , 3 years ago
I have tried to write a smaller testcase, this is how far I've gotten with it.
#include <syscalls.h> #include <stdbool.h> int main() { void* address = NULL; _kern_map_file("libWebKitLegacy.so.1 mmap area", &address, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0xffffffff); _kern_set_memory_protection(&address, 0xff800000, 0x0); }
Compiled with
gcc -g -I /system/develop/headers/private/system/ -I /system/develop/headers/private/system/arch/x86_64/ test.c
This is the strace output I get from the executable:
[ 17747] image_relocated(0x34c31) (42 us) [ 17747] set_area_protection(0x13bd6a, 0x5) = 0x0 No error (2 us) [ 17747] set_area_protection(0x13bd6c, 0x5) = 0x0 No error (4 us) [ 17747] set_area_protection(0x13bd6f, 0x5) = 0x0 No error (1 us) [ 17747] get_system_info(0x7f25abcb59e0) = 0x0 No error (2 us) [ 17747] get_system_info(0x7f25abcb57f0) = 0x0 No error (0 us) [ 17747] reserve_address_range([0x10ced611f000], 0x7, 0x1000000000) = 0x0 No error (1 us) [ 17747] create_area("heap", 0x18305e83ca0, 0x1, 0x40000, 0x0, 0x103) = 0x13bd72 (4 us) [ 17747] resize_area(0x13bd72, 0x50000) = 0x0 No error (1 us) [ 17747] resize_area(0x13bd72, 0x70000) = 0x0 No error (1 us) [ 17747] open(0xffffffff, "/dev/random", 0x0, 0x0) = 0x7 (8 us) [ 17747] read(0x7, 0xffffffffffffffff, 0x18305e5f808, 0x8) = 0x8 (1 us) [ 17747] close(0x7) = 0x0 No error (1 us) [ 17747] resize_area(0x13bd72, 0x90000) = 0x0 No error (1 us) [ 17747] map_file("libWebKitLegacy.so.1 mmap area", 0x7f25abcb5cd8, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0x0) = 0x80000005 (1 us) [ 17747] set_memory_protection(0x7f25abcb5cd8, 0xff800000, 0x0) = 0x80000005 Invalid Argument (0 us) [ 17747] exit_team(0x0) (1 us)
A wierd part in the output above is definetely that the last arg to _kern_map_file is reported as 0x0, while the code sais 0xffffffff
comment:23 by , 3 years ago
What's also weird is that the initial trace says 0xffffffff, but that isn't possible from the code in mmap, which checks that the offset parameter is aligned to the page size. And for this usage of mmap (not mapping a file, but creating an "anonymous mapping"), the offset should be 0 otherwise it won't work.
However, the fd argument should be -1, and it shows as 0 in the trace. Are they swapped?
comment:24 by , 3 years ago
This is a now working test case, hooray!
#include <syscalls.h> #include <stdbool.h> int main() { void* address = NULL; _kern_map_file("libWebKitLegacy.so.1 mmap area", &address, 0x6, 0x100800000, 0x3, 0x1, true, 0xffffffff, 0x0); _kern_set_memory_protection(address, 0xff800000, 0x0); }
Which results in:
~ strace ./a.out [ 18378] image_relocated(0x35296) (53 us) [ 18378] set_area_protection(0x13d8d4, 0x5) = 0x0 No error (2 us) [ 18378] set_area_protection(0x13d8d6, 0x5) = 0x0 No error (3 us) [ 18378] set_area_protection(0x13d8d9, 0x5) = 0x0 No error (3 us) [ 18378] get_system_info(0x7fd5ce695710) = 0x0 No error (3 us) [ 18378] get_system_info(0x7fd5ce695520) = 0x0 No error (0 us) [ 18378] reserve_address_range([0x116e4751b000], 0x7, 0x1000000000) = 0x0 No error (3 us) [ 18378] create_area("heap", 0x20295f2fca0, 0x1, 0x40000, 0x0, 0x103) = 0x13d8dc (8 us) [ 18378] resize_area(0x13d8dc, 0x50000) = 0x0 No error (3 us) [ 18378] resize_area(0x13d8dc, 0x70000) = 0x0 No error (3 us) [ 18378] open(0xffffffff, "/dev/random", 0x0, 0x0) = 0x3 (7 us) [ 18378] read(0x3, 0xffffffffffffffff, 0x20295f0b808, 0x8) = 0x8 (2 us) [ 18378] close(0x3) = 0x0 No error (4 us) [ 18378] resize_area(0x13d8dc, 0x90000) = 0x0 No error (4 us) [ 18378] map_file("libWebKitLegacy.so.1 mmap area", 0x7fd5ce695a08, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0x0) = 0x13d8dd (7 us) [ 18378] set_memory_protection(0xf39400e000, 0xff800000, 0x0) = 0x80000000 Out of memory (2 us) [ 18378] exit_team(0x0) (3 us)
comment:25 by , 3 years ago
On my first attempt at a fix, I got a failure ... because I ran this in a VM with only 2GB of RAM. Probably WebKit should be using MAP_NORESERVE to get an overcommitting area?
After I rebooted with 6GB of RAM, both the test and WebKit worked. Proposed fix: https://review.haiku-os.org/c/haiku/+/4436
comment:26 by , 3 years ago
Indeed, WebKit is assuming it can allocate way more than the system actually has: https://github.com/haiku/haikuwebkit/blob/haiku/Source/JavaScriptCore/wasm/WasmMemory.cpp#L190
Now, where this is invoking mmap, I haven't yet traced. We have OS(HAIKU) in OSAllocatorPOSIX for reserveUncommitted that uses MAP_NORESERVE anyway.
comment:27 by , 3 years ago
Presumably this should use reserveUncommitted instead of what it presently does: https://github.com/WebKit/WebKit/blob/8ba0ead57f2a1238851507cd992ae8ac22ab894a/Source/WTF/wtf/Gigacage.cpp#L47
comment:28 by , 3 years ago
Blocking: | 17219 added |
---|
comment:29 by , 3 years ago
Proposed fix: https://review.haiku-os.org/c/haiku/+/4436
Confirmed fix for me, some sites that crashed before now freeze webkit, but this is a different issue, progress. :)
comment:30 by , 3 years ago
And in followup to make Webkit use less commited memory: https://git.gryphno.de/nephele/haikuwebkit/commit/8f64885c409dd4eba86819c72032abf868ee795b
comment:31 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in hrev55410. I see PulkoMandy has merged the Gigacage change here: https://github.com/haiku/haikuwebkit/commit/44871cb0e724924eb5128ada6aa2c8dacd8b42b8
comment:32 by , 3 years ago
Blocking: | 17260 added |
---|
This is still a problem with HaikuWebKit 1.8.1 Debugger report attached