#16489 closed bug (fixed)
App_server crash when browsing URL with WebPositive
Reported by: | vidrep | Owned by: | PulkoMandy |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta3 |
Component: | Servers/app_server | Version: | R1/beta2 |
Keywords: | Cc: | ttcoder | |
Blocked By: | Blocking: | #16714 | |
Platform: | All |
Description
hrev54507 x86_64 WebKit rebased HaikuWebKit 1.7.0 WebKit 610.1.26
Navigating on this "newly redesigned" website using WebKit rebased will crash app_server.
Attachments (4)
Change History (29)
by , 4 years ago
Attachment: | app_server-674-debug-17-08-2020-22-21-21.report added |
---|
by , 4 years ago
Attachment: | IMG_0283.JPG added |
---|
comment:1 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 4 years ago
comment:3 by , 4 years ago
It seems highly likely this is the same issue, yes; it appears to be due to heap corruption. Probably someone should run app server or test_app_server under the guarded heap.
comment:4 by , 4 years ago
You can assign tickets to me, but I'm currently in vacations and don't have all my hardware setup to investigate things. So don't expect progress from me in that area. Also, I didn't write the Alpha Mask code so I'm not even particularly well qualified to debug these problems.
comment:5 by , 4 years ago
@waddlesplash Is test_app_server runnable from Terminal, maybe even with Web+ as a client ? If so, sounds like the user-space guarded heap would be an easier proposition than using kernel-debugger tools indeed. Something like LD_PRELOAD=libroot_debug.so MALLOC_DEBUG=ges50 test_app_server
might turn up the heap corruptions or user-after-frees with less fuss ("work smarter, not harder" :-)
comment:6 by , 4 years ago
I had my desktop littered with debug reports generated by attempting to navigate that URL I posted. If anyone has a suggestion as to how I might get better data to debug the problem, let me know. PulkoMandy, I saw it was assigned to axeld, and since the trigger for the KDL was Web+, I assumed it might be in your purview. Enjoy your vacation.
comment:7 by , 4 years ago
waddlesplash was referring to the userland guarded heap, as the kernel one wouldn't be of any help. I have tried that yesterday for a long time but am entirely unable to reproduce the issue on the mentioned site. I can also not reproduce this on youtube, but video playback is broken there for me as it claims the browser doesn't support the video format.
Making the app_server run under the guarded heap is not too complicated btw. You can add the environment variables to the launch definition in /system/data/launch/system
by adding a block like this to the app_server entry:
app_server... { env { LD_PRELOAD libroot_debug.so MALLOC_DEBUG grs25 } ... }
Building an updated image or updating the haiku.hpkg with that will make app_server (and input_server, as that is started directly by app_server) run under the guarded heap. Note that the r
flag above also disables memory reuse for maximum effect, but will also burn through RAM quickly. So you may need to change this to just gs25
above if it isn't quick enough to reproduce. Once memory is used up in either case, the app_server will likely just hang and a hard reboot will be needed. This has a chance of filesystem corruption, so make sure that nothing important is left unbackuped.
follow-up: 10 comment:8 by , 4 years ago
Rather than re-package, one may also use the non-packaged hierarchy. I did this:
cd /system/non-packaged/data/ mkdir launch cat > launch/system << EOF service x-vnd.Haiku-app_server { launch /system/servers/app_server env { LD_PRELOAD libroot_debug.so MALLOC_DEBUG grs25 } }
This won't boot though. Even if I remove the "r" to allow memory re-use, Haiku remains stuck on the last ("rocket") icon. KDL can be invoked, and shows the four CPUs are running ide_thread, syslog_daemon (executing vm_something and pending_ici) etc. Invoking "teams" shows there are only 6 teams running, none of which is app_server. Any way to strip down "gs25" some more and still keep it useful ? I have 4 GB of RAM. (x64 of course).
comment:9 by , 4 years ago
Cc: | added |
---|
comment:10 by , 4 years ago
Replying to ttcoder:
Rather than re-package, one may also use the non-packaged hierarchy. I did this:
This only works if you remove the original app_server entry from the packaged launch/system
file. Having app_server in both locations results in a cyclic dependency in launch_daemon for the init target, presumably the app_server depending on itself due to it being present twice. Whether or not this is a bug or intended behaviour I have not investigated.
Launching app_server with the guarded heap doesn't add much overhead initially, so it's definitely not a problem of running out of memory. After a while, the consumption will add up of course, especially with memory reuse disabled.
by , 4 years ago
Attachment: | _tts_appsrv_launchscript-20.3-1-x86_64.hpkg added |
---|
Run app_server with the guarded heap (with launch_daemon script tweak, no need to rebuild the system)
comment:11 by , 4 years ago
So I now override the data/launch/system file with an hpkg, that works.
With "r" to disable memory re-use, app_server quickly reached 1.97 GB memory usage and the machine locks up solid. Tried a second time, it locked up at almost the same ceiling (1.98 GB).
In both cases I had a hell of a time to reboot the machine (trying to invoke KDL, to hold Ctrl-Alt-Del for several seconds etc).
Now I'm trying with memory re-use allowed (see attached hpkg) and I'm at a comfortable 220 MB memory usage. I'm going to try my luck with YouTube.
I find it interesting that Haiku applications would crash at the 231 bytes mark by the way, I though such limits were gone when using the 64 bit variant of Haiku, and the OS could make use of the whole range of physical memory (4 GB, 8 GB, whatever), not just as a collective, but also each app individually.
comment:12 by , 4 years ago
The whole story is a bit more complex. But basically, the thing is, the limit is gone, but the memory allocator isn't aware of it yet. So, with malloc() you can only get about 2GB. But you can use mmap or create_area and then do your own things there, and for that, the limit is removed.
We have at some point moved to rpmalloc which removes that limitation, but it turns out, it needs quite a lot of memory space (being designed for 64bit systems where this isn't a problem) and on 32bit it would run out of memory even earlier. So we have reverted that for now. We will be testing other allocators at some point (possibly the musl one) for the default setting.
But for debugging, we use yet another (intentionally simpler) allocator. We can maybe tweak its initial size reservation on 64bit?
by , 4 years ago
Attachment: | app_server-504-debug-08-09-2020-11-51-00.report added |
---|
Yay, still reproducible with the guarded heap, and the report gives a couple more hints than usual
comment:13 by , 4 years ago
Thanks for the explanation @pulkomandy, so it's a limitation of the "hoard" allocator. No big deal. Switching to "memory re-use" (to slow down the creep towards the 2 GB mark) was successful and I could reproduce the crash that way anyway.
The guarded heap gives some additional hints, maybe someone can get lucky with them:
thread 1220: w:936:offscreen state: Call (thread 1220 tried accessing address 0x2f566000 which is a guard page (base: 0x2f565fc0, size: 54, alignment: 16, allocated by thread: 1220, freed by thread: -1)) ... 0x7f23d58410f0 0x75ba8483ae void agg::render_scanlines<agg::rasterizer_scanline_aa_subpix<agg::rasterizer_sl_clip<agg::ras_conv_int> >, agg::scanline_p8_subpix, agg::renderer_scanline_subpix_solid<agg::renderer_region<PixelFormat> > >(agg::rasterizer_scanline_aa_subpix<agg::rasterizer_sl_clip<agg::ras_conv_int> >&, agg::scanline_p8_subpix&, agg::renderer_scanline_subpix_solid<agg::renderer_region<PixelFormat> >&) + 0x30e
The syslog does not have much more:
KERN: user access on kernel area 0x35c4 at 0x000000002f566000 KERN: vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0x2f566000, ip 0x75ba8483b2, write 1, user 1, thread 0x4c4 KERN: 1220: DEBUGGER: thread 1220 tried accessing address 0x2f566000 which is a guard page (base: 0x2f565fc0, size: 54, alignment: 16, allocated by thread: 1220, freed by thread: -1)
comment:14 by , 4 years ago
Yes, so we can now see that the alpha mask code is apparently drawing outside its mask bitmap:
133 thread 1220: w:936:offscreen 134 state: Call (thread 1220 tried accessing address 0x2f566000 which is a guard page (base: 0x2f565fc0, size: 54, alignment: 16, allocated by thread: 1220, freed by thread: -1)) 135 136 Frame IP Function Name 137 ----------------------------------------------- 138 00000000 0x5afbf44fcf _kern_debugger + 0x7 139 Disassembly: 140 _kern_debugger: 141 0x0000005afbf44fc8: 48c7c0e1000000 mov $0xe1, %rax 142 0x0000005afbf44fcf: 0f05 syscall <-- 143 144 0x7f23d5840c60 0x5afbfd176d panic(char const*, ...) + 0xad 145 0x7f23d5840cc0 0x5afbfd1de4 guarded_heap_segfault_handler(int, __siginfo_t*, void*) + 0x174 146 0x7f23d5840cc0 0x7fdfa9d0c23b commpage_signal_handler + 0x2b 147 0x7f23d58410f0 0x75ba8483ae void agg::render_scanlines<agg::rasterizer_scanline_aa_subpix<agg::rasterizer_sl_clip<agg::ras_conv_int> >, agg::scanline_p8_subpix, agg::renderer_scanline_subpix_solid<agg::renderer_region<PixelFormat> > >(agg::rasterizer_scanline_aa_subpix<agg::rasterizer_sl_clip<agg::ras_conv_int> >&, agg::scanline_p8_subpix&, agg::renderer_scanline_subpix_solid<agg::renderer_region<PixelFormat> >&) + 0x30e 148 0x7f23d5841170 0x75ba85384c BRect Painter::_FillPath<agg::conv_curve<agg::path_base<agg::vertex_block_storage<double, (unsigned int)8, (unsigned int)256> >, agg::curve3, agg::curve4> >(agg::conv_curve<agg::path_base<agg::vertex_block_storage<double, (unsigned int)8, (unsigned int)256> >, agg::curve3, agg::curve4>&) const + 0x32c 149 0x7f23d58411a0 0x75ba83d0a3 Painter::DrawShape(int const&, unsigned int const*, int const&, BPoint const*, bool, BPoint const&, float) const + 0x73 150 0x7f23d5841220 0x75ba829d64 DrawingEngine::DrawShape(BRect const&, int, unsigned int const*, int, BPoint const*, bool, BPoint const&, float) + 0x64 151 0x7f23d5841270 0x75ba828538 ShapeAlphaMask::DrawVectors(Canvas*) + 0x98 152 0x7f23d5841440 0x75ba8288f3 VectorAlphaMask<ShapeAlphaMask>::_RenderSource(IntRect const&) + 0x263 153 0x7f23d58414d0 0x75ba827818 AlphaMask::_Generate() + 0x48 154 0x7f23d5841550 0x75ba827cbf _ZN9AlphaMask17SetCanvasGeometryE8IntPoint7IntRect.localalias.45 + 0x10f 155 0x7f23d58415b0 0x75ba7eb342 ServerWindow::_UpdateDrawState(View*) + 0xc2 156 0x7f23d5841710 0x75ba7f50cf ServerWindow::_DispatchViewMessage(int, BPrivate::LinkReceiver&) + 0x272f 157 0x7f23d58417d0 0x75ba7f585f ServerWindow::_DispatchMessage(int, BPrivate::LinkReceiver&) + 0x34f 158 0x7f23d5841840 0x75ba7f061b ServerWindow::_MessageLooper() + 0x23b 159 0x7f23d5841850 0x75ba7d2507 MessageLooper::_message_thread(void*) + 0x7 160 0x7f23d5841870 0x5afbf43d77 thread_entry + 0x17 161 00000000 0x7fdfa9d0c260 commpage_thread_exit + 0
If I remember correctly, the alpha mask code first computes the bounds of actual touched pixels, and then allocates a bitmap just large enough for that. Maybe the computation is incorrect in some case?
With the normal allocator this would corrupt memory (and fail a bit later), but now it is detected sooner, which is a lot more useful.
comment:15 by , 4 years ago
It appears that the picture bounding box player, used by the VectorAlphaMask to create the bitmap, does not support draw_picture or set_clipping_rects: https://github.com/haiku/haiku/blob/master/src/servers/app/PictureBoundingBoxPlayer.cpp#L441
comment:16 by , 4 years ago
I've been navigating this website for about 15 minutes and got no crash. Does this still happen for anyone?
comment:17 by , 4 years ago
I tried just now. First try froze my system. No mouse or keyboard. I had to do a hard reboot. Second try resulted in a Web+ crash. This was in the syslog:
KERN: 939: DEBUGGER: Could not create BWindow's receive port, used for interacting with the app_server! KERN: _user_debugger(): Failed to install debugger. Message is: `Could not create BWindow's receive port, used for interacting with the app_server!' KERN: thread_hit_debug_event(): Failed to create debug port: No more ports available
comment:18 by , 4 years ago
That's a different problem, creating too many offscreen bitmaps and running out of ports because each (view-accepting) bitmap needs a port. It's been there for a few years already. I hope the next WebKit release using the new app_server compositing code will improve the situation by reducing the number of temporary offscreens we need to create.
I was running in QEMU with a single CPU core and not that much memory (512MB then I increased to 768, in both case it was eventually all used). I will try with more RAM to see if I can get it to run out of ports before it runs out of RAM...
comment:19 by , 4 years ago
FWIW, here (64bit, 16gb RAM, Web+rebased-Dec-6-2020), app_server doesn't crash, but Web+ closes unceremoneously with this in the syslog:
KERN: 948: DEBUGGER: Could not create BWindow's receive port, used for interacting with the app_server! KERN: _user_debugger(): Failed to install debugger. Message is: `Could not create BWindow's receive port, used for interacting with the app_server!' KERN: thread_hit_debug_event(): Failed to create debug port: No more ports available
comment:20 by , 3 years ago
Now that youtube is working again in Web+ I can reproduce it by playing some video for a minute. This is with app_server running under libroot_debug.so (without MALLOC_DEBUG option tho):
state: Call (someone wrote beyond small allocation at 0x1aa8513d680; size: 104 bytes; allocated by 9827; value: 0x51a1b1c1c1d1e) Frame IP Function Name ----------------------------------------------- 00000000 0x1e4acfd370f _kern_debugger + 0x7 Disassembly: _kern_debugger: 0x000001e4acfd3708: 48c7c0e4000000 mov $0xe4, %rax 0x000001e4acfd370f: 0f05 syscall <-- 0x7fa5b39f6cd0 0x1e4ad05da8d panic(char const*, ...) + 0xad 0x7fa5b39f6d30 0x1e4ad05f5e8 heap_free(heap_allocator_s*, void*) + 0x158 0x7fa5b39f6dc0 0x1e4ad05ff36 debug_heap_free(void*) + 0x26 0x7fa5b39f6de0 0x1f167bb6601 Painter::~Painter() + 0x101 0x7fa5b39f6e00 0x1f167bb668c Painter::~Painter() + 0xc 0x7fa5b39f6e20 0x1f167ba101d DrawingEngine::~DrawingEngine() + 0x2d 0x7fa5b39f6e40 0x1f167ba103c DrawingEngine::~DrawingEngine() + 0xc 0x7fa5b39f7010 0x1f167b9faf6 VectorAlphaMask<ShapeAlphaMask>::_RenderSource(IntRect const&) + 0x296 0x7fa5b39f70a0 0x1f167b9e9e8 AlphaMask::_Generate() + 0x48 0x7fa5b39f7120 0x1f167b9ee8f AlphaMask::SetCanvasGeometry(IntPoint, IntRect) [clone .localalias.50] + 0x10f 0x7fa5b39f7180 0x1f167b600f2 ServerWindow::_UpdateDrawState(View*) + 0xc2 0x7fa5b39f72e0 0x1f167b6a9ec ServerWindow::_DispatchViewMessage(int, BPrivate::LinkReceiver&) + 0x273c 0x7fa5b39f73a0 0x1f167b6b1af ServerWindow::_DispatchMessage(int, BPrivate::LinkReceiver&) + 0x34f 0x7fa5b39f7410 0x1f167b6541b ServerWindow::_MessageLooper() + 0x23b 0x7fa5b39f7420 0x1f167b43397 MessageLooper::_message_thread(void*) + 0x7 0x7fa5b39f7440 0x1e4acfd2487 thread_entry + 0x17 00000000 0x7f904dbcc260 commpage_thread_exit + 0
comment:22 by , 3 years ago
I've been testing this fix for the last 25 minutes and app_server doesn't crash! Many thanks! It would be a shame to release beta3 without this fix.
comment:24 by , 3 years ago
Milestone: | Unscheduled → R1/beta3 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
comment:25 by , 2 years ago
Blocking: | 16714 added |
---|
Is that reproducible? If yes, that might be just what the doctor ordered in #15728 :-). The backtrace is slightly different, but still heavily AlphaMasks related. Though maybe it's better to ask pulkomandy's permission before assigning the ticket to him...