Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16898 closed bug (fixed)

WebKit rebased consistently crashes on Amazon.ca [WASM memory allocation error]

Reported by: vidrep Owned by: pulkomandy
Priority: normal Milestone: Unscheduled
Component: Kits/Web Kit Version: R1/beta2
Keywords: Cc:
Blocked By: Blocking: #16957, #17219, #17260
Platform: x86-64

Description

hrev55026 x86_64

HaikuWebKit 1.7.0

WebKit 611.1.8

WebPositive consistently crashes after logging into my account and accessing the order history

Debugger report attached

Attachments (7)

WebPositive-1222-debug-04-04-2021-20-10-30.report (27.7 KB ) - added by vidrep 3 years ago.
WebPositive-2306-debug-10-07-2021-00-08-02.report (48.6 KB ) - added by vidrep 3 years ago.
WebPositive-986-debug-16-07-2021-16-14-36.report (30.4 KB ) - added by vidrep 3 years ago.
WTFLog.986.txt (31 bytes ) - added by vidrep 3 years ago.
terminal.txt (413 bytes ) - added by vidrep 3 years ago.
WTFLog.2411.txt (31 bytes ) - added by vidrep 3 years ago.
trace.txt.zip (368.1 KB ) - added by vidrep 3 years ago.

Download all attachments as: .zip

Change History (39)

comment:1 by pulkomandy, 3 years ago

Component: Applications/WebPositiveKits/Web Kit

comment:2 by pulkomandy, 3 years ago

Blocking: 16957 added

comment:3 by pulkomandy, 3 years ago

Platform: Allx86-64

comment:4 by vidrep, 3 years ago

This is still a problem with HaikuWebKit 1.8.1 Debugger report attached

comment:5 by pulkomandy, 3 years ago

I cannot reproduce the issue and I think it affects only the 64bit version. By looking at the stacktrace and the related code, I think there is an issue with mprotect failing and WebKit stops with a release assert in this case.

It would help to build WebKit with some extra logging:

  • Edit Source/WTF/wtf/Datalog.cpp
  • Change DATA_LOG_TO_FILE define to 1 instead of 0
  • Compile WebKit
  • Reproduce the crash with this version

It should create a file in /tmp/WTFLog* with hopefully a bit more information about what's happening.

comment:6 by pulkomandy, 3 years ago

Summary: WebKit rebased consistently crashes on Amazon.caWebKit rebased consistently crashes on Amazon.ca [WASM memory allocation error]

comment:7 by vidrep, 3 years ago

I have attached new debugger report and WTFlog as per your instructions.

by vidrep, 3 years ago

Attachment: WTFLog.986.txt added

comment:8 by pulkomandy, 3 years ago

So, here is the involved mprotect calls:

(not sure if the first one is enabled in our case)

It seems mprotect is not happy with the passed parameters. There are several cases where it can return a "no memory" error. Probably WebKit is trying to use it in a way that we don't support yet.

I think the next step is to reproduce the crash while running with strace:

    strace WebPositive > trace.txt

This will show all system calls, including the calls to mprotect. Maybe we can learn something more from that trace.

comment:9 by vidrep, 3 years ago

Attached are the results from the test per instructions in comment:8

by vidrep, 3 years ago

Attachment: terminal.txt added

by vidrep, 3 years ago

Attachment: WTFLog.2411.txt added

comment:10 by vidrep, 3 years ago

trace.txt exceeds 5MB size

by vidrep, 3 years ago

Attachment: trace.txt.zip added

comment:11 by vidrep, 3 years ago

I had to zip it in order to attach.

comment:12 by pulkomandy, 3 years ago

Analysis of the trace reveals that WebKit is calling mprotect on a 4GB segment of memory, and this fails. I am not sure how to investigate further as this happens in the WASM code, which is disabled in the 32bit version, so I can't reproduce the issue on my current setup.

comment:13 by waddlesplash, 3 years ago

I guess it wants a full 4GB virtual address space for the WASM runtime. Maybe it is somehow trying to reserve the memory and there are not 4GB available, instead of requesting overcommit?

comment:14 by nephele, 3 years ago

This seems unlikely to me, as I have gotten that crash severall times on my home computer with 32GB of physical memory excluding swap, but with not nearly as much used. I can do quantitive tests if needed though.

comment:15 by pulkomandy, 3 years ago

The mprotect call is there to mark the memory as unreserved. Just before it there is a call to mmap to allocate the address range, as far as I can see, that one succeeds.

comment:16 by madmax, 3 years ago

I don't know what I'm talking about, but given that https://review.haiku-os.org/c/haiku/+/4212 corrects a check that would return ENOMEM, could that be the problem?

comment:17 by pulkomandy, 3 years ago

I made this change after looking at this problem and looking at the mprotect implementation. I found one problem while looking at the code and fixed it, but I have no idea if it is related or not. Test it and let us know, I guess?

comment:18 by nephele, 3 years ago

I think this might be a problem of haiku that it refuses to allocate bigger slices. with 31ish GiB of memory free:

~ dd if=/dev/zero of=file count=1 bs=1063307069                                      
1+0 records in
1+0 records out
1063307069 bytes (1,1 GB, 1014 MiB) copied, 0,535022 s, 2,0 GB/s
~ dd if=/dev/zero of=file count=1 bs=1063307070                                      
dd: memory exhausted by input buffer of size 1063307070 bytes (1014 MiB)

comment:20 by nephele, 3 years ago

In src/system/libroot/posix/malloc_hoard2/config.h There is a #define MAX_INTERNAL_FRAGMENTATION 2, setting this to 6 does fix the dd issue... but not the Wasm allocation one. So seems unrelated after all. :/

(And i have no idea what that constant does... so setting this up may or may not be a good idea :D)

Last edited 3 years ago by nephele (previous) (diff)

comment:21 by pulkomandy, 3 years ago

The crash here is not in malloc code. It's in webkit directly calling mmap to manage its own memory. So, no, this isn't related.

There is a known problem with the hoard2 allocator that it is based on a single memory area that we currently size at about 2GB. Whenever an application tries to malloc more than that, it doesn't work. We need to replace the memory allocator, but so far we have not found one that handles our use case well, especially on 32bit systems. Modern allocators tend to require a lot of address space there, and they also create a lot of smaller areas, when we would prefer a single larger one. But WebKit isn't subject to this, at least in this case.

If you want to investigate this, you rather need to look at the mprotect implementation and see why it is failing.

Here is the failing syscall:

[  2411] set_memory_protection(0xc60265e000, 0xff800000, 0x0) = 0x80000000 Out of memory (2 us)

I think the area this is operating on is this one, created just before:

[  2411] map_file("libWebKitLegacy.so.1 mmap area", 0x7fb77fd2fe48, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0xffffffff) = 0x486e (6 us)

It's a bit hard to be sure, because map_file returns an area_id, but doesn't say the address it mapped the area at. The set_memory_protection has a smaller size than the area, that corresponds to the "initialBytes" offset in the mprotect calls that I linked in an earlier comment.

The size of the memory part it's trying to unmap is quite large (about 4GB) but on a 64bit system I think this should not be a problem? The initial size of the area is also more than 4GB.

So, it should not be too hard to write a similar test case now that we have the sequence of operations:

1) call mmap to create a large area (size: 0x100800000, that's 4GiB + 8MiB) 2) call mprotect to unmap most of that area, keeping 16MiB allocated at the start

Confirm with strace that you get the same parameters as in the trace we have here (addresses and area IDs will be different, but try to get the flags as identical as possible).

Then we will have something that's a lot easier to investigate on. Hopefully it reproduces the problem.

comment:22 by nephele, 3 years ago

I have tried to write a smaller testcase, this is how far I've gotten with it.

#include <syscalls.h>
#include <stdbool.h>

int main() {
	void* address = NULL;
	_kern_map_file("libWebKitLegacy.so.1 mmap area", &address, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0xffffffff);
	_kern_set_memory_protection(&address, 0xff800000, 0x0);
}

Compiled with

gcc -g -I /system/develop/headers/private/system/ -I /system/develop/headers/private/system/arch/x86_64/ test.c

This is the strace output I get from the executable:

[ 17747] image_relocated(0x34c31) (42 us)
[ 17747] set_area_protection(0x13bd6a, 0x5) = 0x0 No error (2 us)
[ 17747] set_area_protection(0x13bd6c, 0x5) = 0x0 No error (4 us)
[ 17747] set_area_protection(0x13bd6f, 0x5) = 0x0 No error (1 us)
[ 17747] get_system_info(0x7f25abcb59e0) = 0x0 No error (2 us)
[ 17747] get_system_info(0x7f25abcb57f0) = 0x0 No error (0 us)
[ 17747] reserve_address_range([0x10ced611f000], 0x7, 0x1000000000) = 0x0 No error (1 us)
[ 17747] create_area("heap", 0x18305e83ca0, 0x1, 0x40000, 0x0, 0x103) = 0x13bd72 (4 us)
[ 17747] resize_area(0x13bd72, 0x50000) = 0x0 No error (1 us)
[ 17747] resize_area(0x13bd72, 0x70000) = 0x0 No error (1 us)
[ 17747] open(0xffffffff, "/dev/random", 0x0, 0x0) = 0x7 (8 us)
[ 17747] read(0x7, 0xffffffffffffffff, 0x18305e5f808, 0x8) = 0x8 (1 us)
[ 17747] close(0x7) = 0x0 No error (1 us)
[ 17747] resize_area(0x13bd72, 0x90000) = 0x0 No error (1 us)
[ 17747] map_file("libWebKitLegacy.so.1 mmap area", 0x7f25abcb5cd8, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0x0) = 0x80000005 (1 us)
[ 17747] set_memory_protection(0x7f25abcb5cd8, 0xff800000, 0x0) = 0x80000005 Invalid Argument (0 us)
[ 17747] exit_team(0x0) (1 us)

A wierd part in the output above is definetely that the last arg to _kern_map_file is reported as 0x0, while the code sais 0xffffffff

comment:23 by pulkomandy, 3 years ago

What's also weird is that the initial trace says 0xffffffff, but that isn't possible from the code in mmap, which checks that the offset parameter is aligned to the page size. And for this usage of mmap (not mapping a file, but creating an "anonymous mapping"), the offset should be 0 otherwise it won't work.

However, the fd argument should be -1, and it shows as 0 in the trace. Are they swapped?

comment:24 by nephele, 3 years ago

This is a now working test case, hooray!

#include <syscalls.h>
#include <stdbool.h>

int main() {
	void* address = NULL;
	_kern_map_file("libWebKitLegacy.so.1 mmap area", &address, 0x6, 0x100800000, 0x3, 0x1, true, 0xffffffff, 0x0);
	_kern_set_memory_protection(address, 0xff800000, 0x0);
}

Which results in:

~ strace ./a.out                                                                                                            
[ 18378] image_relocated(0x35296) (53 us)
[ 18378] set_area_protection(0x13d8d4, 0x5) = 0x0 No error (2 us)
[ 18378] set_area_protection(0x13d8d6, 0x5) = 0x0 No error (3 us)
[ 18378] set_area_protection(0x13d8d9, 0x5) = 0x0 No error (3 us)
[ 18378] get_system_info(0x7fd5ce695710) = 0x0 No error (3 us)
[ 18378] get_system_info(0x7fd5ce695520) = 0x0 No error (0 us)
[ 18378] reserve_address_range([0x116e4751b000], 0x7, 0x1000000000) = 0x0 No error (3 us)
[ 18378] create_area("heap", 0x20295f2fca0, 0x1, 0x40000, 0x0, 0x103) = 0x13d8dc (8 us)
[ 18378] resize_area(0x13d8dc, 0x50000) = 0x0 No error (3 us)
[ 18378] resize_area(0x13d8dc, 0x70000) = 0x0 No error (3 us)
[ 18378] open(0xffffffff, "/dev/random", 0x0, 0x0) = 0x3 (7 us)
[ 18378] read(0x3, 0xffffffffffffffff, 0x20295f0b808, 0x8) = 0x8 (2 us)
[ 18378] close(0x3) = 0x0 No error (4 us)
[ 18378] resize_area(0x13d8dc, 0x90000) = 0x0 No error (4 us)
[ 18378] map_file("libWebKitLegacy.so.1 mmap area", 0x7fd5ce695a08, 0x6, 0x100800000, 0x3, 0x1, true, 0x0, 0x0) = 0x13d8dd (7 us)
[ 18378] set_memory_protection(0xf39400e000, 0xff800000, 0x0) = 0x80000000 Out of memory (2 us)
[ 18378] exit_team(0x0) (3 us)

comment:25 by waddlesplash, 3 years ago

On my first attempt at a fix, I got a failure ... because I ran this in a VM with only 2GB of RAM. Probably WebKit should be using MAP_NORESERVE to get an overcommitting area?

After I rebooted with 6GB of RAM, both the test and WebKit worked. Proposed fix: https://review.haiku-os.org/c/haiku/+/4436

comment:26 by waddlesplash, 3 years ago

Indeed, WebKit is assuming it can allocate way more than the system actually has: https://github.com/haiku/haikuwebkit/blob/haiku/Source/JavaScriptCore/wasm/WasmMemory.cpp#L190

Now, where this is invoking mmap, I haven't yet traced. We have OS(HAIKU) in OSAllocatorPOSIX for reserveUncommitted that uses MAP_NORESERVE anyway.

comment:27 by waddlesplash, 3 years ago

Presumably this should use reserveUncommitted instead of what it presently does: https://github.com/WebKit/WebKit/blob/8ba0ead57f2a1238851507cd992ae8ac22ab894a/Source/WTF/wtf/Gigacage.cpp#L47

comment:28 by waddlesplash, 3 years ago

Blocking: 17219 added

comment:29 by nephele, 3 years ago

Proposed fix: ​https://review.haiku-os.org/c/haiku/+/4436

Confirmed fix for me, some sites that crashed before now freeze webkit, but this is a different issue, progress. :)

comment:31 by waddlesplash, 3 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev55410. I see PulkoMandy has merged the Gigacage change here: https://github.com/haiku/haikuwebkit/commit/44871cb0e724924eb5128ada6aa2c8dacd8b42b8

comment:32 by waddlesplash, 3 years ago

Blocking: 17260 added
Note: See TracTickets for help on using tickets.