Opened 14 years ago
Closed 10 years ago
#6738 closed bug (fixed)
app_server hangs?
Reported by: | kirilla | Owned by: | stippi |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Servers/app_server | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #6929, #10258 | |
Platform: | All |
Description
App_server hangs, from what I can tell, occasionally, in Haiku hrev37150.
Mouse clicks stop doing anything meaningful. I can't raise or lower windows, can't shift focus to some other window or view. I -can- move the mouse around, on-screen, but after a while the pointer hangs too.
Volountarily entering KDL to have a look-see, reveals that on my quad core, three idle threads are running, and there's a w:offscreen something thread running, which belongs to app_server. A backtrace hints at Painter, agg and a chain of 10-20 calls on a recursive bezier function.
A locking issue in Painter, maybe? I think WebPositive is the app which triggers this. Will try to look out for a website that triggers it.
Sadly I have no means to provide serial output or snapshots right now.
Attachments (1)
Change History (17)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
I can reliable reproduce this issue on real hardware and on Qemu by trying to view http://qt.gitorious.org/qt or any other Gitorious project site in Web+. Tested on hrev39121 gcc4 anyboot nightly.
comment:3 by , 14 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 14 years ago
This could perhaps shed some light on the matter: http://www.antigrain.com/research/adaptive_bezier/index.html
My stack traces show the eight arguments (of the four points) as all 0xff...ffe for maybe ten of the last runs of recursion.
comment:6 by , 14 years ago
Replying to kirilla:
Sadly I have no means to provide serial output or snapshots right now.
When you leave KDL the session is written to the syslog after a few seconds (at least if the kernel is still working and the syslog daemon is still running). So the info should be available after reboot (or even in the same session via ssh, if that is still working).
No clue what double value that hex value represents (is it not shown?). I suppose a dump of the curve_div4 object would help, too, if anyone wants to try and understand what is happening exactly.
by , 14 years ago
comment:7 by , 14 years ago
Scratch that. As can be seen in the syslog excerpt, the hex value is 0xffffffe000000000. These are coordinates, in double format.
The webpage http://babbage.cs.qc.edu/IEEE-754/64bit.html suggests that this hexadecimal representation of a floating-point number is -NaN, not a number.
comment:8 by , 14 years ago
What apparently happens:
- The client sends a drawing command (drawing a shape) with invalid parameters (e.g. NaN coordinates).
- The app server doesn't check for invalid values (or misses this case) and calls
curve4_div()
with invalid parameters. curve4_div::recursive_bezier()
always recurses to the last level, causing 234 - 1 calls which should keep the CPU quite busy. I haven't checked, but possibly it also tries to add a few billion points to the object's point array, which would cause serious memory issues. But even if it doesn't, the CPU hogging alone (probably while holding some lock) could already make the app server appear to hang.
So the measures to be taken are:
- Fix parameter checking in the app server.
- Possibly add sanity limits to
curve4_div::recursive_bezier()
. - Fix the client side (assuming that it is indeed the source of the bad values).
comment:9 by , 14 years ago
What if it's a math issue?
I had a quick look at our math_test, but couldn't get it to build.
Running http://www.netlib.org/fp/ucbtest.tgz (port attempt: http://www.kirilla.com/tmp/ucb_haiku.diff) seems to suggest there are issues with Haiku (or my hardware?).
Maybe this could be of interest: http://netbsd-soc.sourceforge.net/projects/mathlib/
comment:12 by , 14 years ago
I have tried to reproduce the issue with some debugging added to the BShape iteration function in Painter, but neither with the Firefox add-on page from #6929, nor with the Gitorous page for Qt can I reproduce the problem in QEMU. I will check it out on real hardware next. The revision I am running is hrev40374 build as GCC4 based hybrid.
comment:15 by , 10 years ago
Blocking: | 10258 added |
---|
It happened again in hrev39121. WebPositive triggered it.
KDL showed a thread running somewhere in this code: http://haiku.it.su.se:8180/source/xref/src/libs/agg/src/agg_curves.cpp#388
recursing, with the level argument increasing in steps of 1, from 0 to 32.
There's a curve_recursion_limit = 32.
Perhaps the
if (level > curve_recursion_limit)
was meant to be
if (level >= curve_recursion_limit)