Opened 12 years ago
Closed 10 years ago
#9528 closed bug (fixed)
stop_watching() KDLs on page fault, monitoring a CDDA
Reported by: | ttcoder | Owned by: | bga |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | File Systems/cdda | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This is hrev45325
Occured twice in a row; first when closing a Tracker window showing the audio CD's contents while TP was running, then after rebooting, occured with TP alone, no Tracker interaction (stack trace looked similar).
Searched tickets for "page fault stop_watching" but came up empty, does not seem to be a duplicate.
Attachments (4)
Change History (18)
by , 12 years ago
Attachment: | page_panic_with_CDDA.jpeg added |
---|
comment:1 by , 12 years ago
comment:2 by , 12 years ago
Summary: | stop_watching() KDLs on page panic, watching a CDDA → stop_watching() KDLs on page panic, monitoring a CDDA |
---|
comment:3 by , 12 years ago
Summary: | stop_watching() KDLs on page panic, monitoring a CDDA → stop_watching() KDLs on page fault, monitoring a CDDA |
---|
comment:4 by , 12 years ago
Component: | File Systems/BFS → System/Kernel |
---|
by , 11 years ago
Attachment: | on_opening_with_tracker-2014-04-15 15.58.57.jpg added |
---|
hrev47102, start_watching(), corrupt edx value
by , 11 years ago
Attachment: | on_clicking_start-2014-04-15 16.30.39.jpg added |
---|
hrev47102, read_cd(), corrupt cd_driver_info* pointer
comment:5 by , 11 years ago
The above comes from R. who has a knack for triggering this KDL with just a snap of fingers :-)
Gotta crosspost this to #10259 since I've realized the following:
- the four-bytes faulty addresses in the above KDLs correspond to ascii strings
lect
andric
, respectively. Could be legit on its face... Except that..: - reading the track list of the concerned CD that R. shared with me, there's a match in both cases for track number 15, whose title is
Are Friends Electric
I'm going to (try to make time to) scan the cdda source code with that in mind, and make custom builds of it with lots of features disabled (especially features related to track listing and titles), until we find the one memory-corrupting feature. I'm a out of my pool depth but could get lucky..
comment:6 by , 11 years ago
I had a look at cdda.cpp, I think the function to_utf8() has a loop problem possibly unrelated to the failure encountered by the OP:
We check that "out == sizeof(buffer) - 1" is false, but later for "c" between 0x80 and 0x800, we increment "out" twice, which might miss the check on the next iteration.
So line 60 should be:
if (out >= sizeof(buffer) - 2)
or we could add a second check line 66 leading to a loop break.
comment:7 by , 11 years ago
It seems I've hit a lucky strike on the first attempt: I've made a first build with disabled read_cdtext()
and DetermineName()
(which in turn means to_utf8
is not used either) and it didn't crash once yet. Fingers crossed.
Next I'll narrow down the problem and update the ticket. That will probably take a few weeks due to a heavy workload of late.
comment:8 by , 10 years ago
Turns out it's not that easy to find the root cause...
If I hardcode the file descriptor attrFD to -1 (so as to disable attribute reading) in Volume::DetermineName()
here...
http://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/file_systems/cdda/kernel_interface.cpp#n821
I have a rock-solid cdda driver (to note, the other change is that I've commented out all the cd-text stuff too; but contrarily to initial impressions I think the cd-text code is not activated at all, apparently it only serves for rare occurences of audio CDs that have textual strings embedded inside the audio, something which I have never seen and do not have in any of my audio CDs here I don't think).
If however I do not comment out the attrFD and open_attributes()
stuff, then Haiku crashes easily... somewhere in code unrelated to the cdda driver (see grab). So my working theories are:
- there is some kind of corruption created somewhere in the (big) remainder of the code that I didn't not comment out, e.g. class Attribute, class Volume ..etc.
- there is some kind of corruption from the cd-text code (again, that's unlikely by virtue of the crash occuring again now DetermineName() is back and cd-text is STILL commented out, and additionally I would say it's even impossible impossible by virtue of the cd-text code NOT being exercised even if I do not comment it out anyway).
- there is some kind of interaction/bug with something else than the cdda driver, be it the VM subsystem or whatnot (an unlikely theory it seems).
Neither of which leads to a speedy resolution.. Anyway I guess I should be happy since I can at least use the driver somewhat reliably now, if I comment out the DetermineName() code :-]
by , 10 years ago
Attachment: | DSCF6766.JPG added |
---|
VMCache::InsertPage() fault (cdda config: WithOut cdtext, With open_attributes())
comment:9 by , 10 years ago
Significant milestone reached -- I now know for sure who the "ascii" buffer belongs to, it is actually the Volume::fName
const char*.
Details:
I've modified the Volume
dtor to fill out its fName buffer before free
ing it, here's the diff:
Volume::~Volume() { ... + memset(fName, 'z', strlen(fName)); free(fName); }
(the letter z is 0x7a in hex).
And sure enough, the KDL page fault I get from now on is on indirection 0x7a7a7a7a
!
I've tried to invoke syslog
in KDL, see if something comes up, since my modified driver has some tracing like dprintf("%p\n", fName) ..etc in places but nothing was immediately apparent.
I was hoping to see a clear "use after free" symptom, like somebody calling the deleted Volume, making it write to its (long-since free()d) fName buffer, for example... I have a feeling it's something like that because whenever I mount an audio CD it gets a sequential number, like Audio CD1, Audio CD2 ..etc (or Jogeir Liljedahl1, Jogeir Liljedahl2 ..etc), as if somebody is still keeping a handle on the mounted CD even after I eject it and cdda_unmount() is called, which in turn calls 'delete volume'.. but didn't find anything.
I'm now going to comment out the free() (i.e. leak the memory), see if it "fixes" the driver. Failing that I'll try to outright comment out the 'delete volume', though that would represent a bigger memory leak..
EDIT: what's weird is that I've also tried adding the following:
free(fName); +fName = (char*)0xdead3003; }
yet that changes absolutely nothing, still get a KDL page fault on 0x7a7a7a7a .. I suppose it means either
- the fName pointer is copied.. But I see that mistake nowhere in the cdda code (and I didn't find evidence either of class Volume being copied via its bitwise copy constructor, which is also a common coding mistake)..
- cdda is actually an innocent bystander, and somebody else is messing with its legitimately owned fName memory buffer?
- something else?
comment:10 by , 10 years ago
comment:11 by , 10 years ago
Good catch Jerome, one version of _RestoreAttributes() invokes close() whereas the other.. does not. Yet the calling code in Mount() did not compensate for that change. I guess it probably cannot result in a KDL page fault though.. But I'll include the missing close(fd)
call in whatever patch I submit later (if I'm lucky to get to that stage :-) as leaking file descriptors is probably serious (dunno if there's resource tracking going on when the cdda addon is unloaded)..
EDIT: oh, maybe that file descriptor leak explains why an unmounted audio CD is still referred to after being unmounted ? Not sure how that works.
EDIT2: a late edit, for the record.. The fact that there is still an /AudioCD mountpoint in vfs even after ejecting a CD (with the physical button) is due to this: https://dev.haiku-os.org/ticket/3026#comment:1 which seems pretty harmless.
comment:12 by , 10 years ago
Applied a fix for the descriptor leak in hrev47477 (along with fixes from Augustin).
comment:13 by , 10 years ago
Component: | System/Kernel → File Systems/cdda |
---|---|
Owner: | changed from | to
comment:14 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This one fits perfectly with hrev48946 so I'm going to close this as well. Please reopen if you can still reproduce.
TP stands for TunePrepper BTW.
The sequence of events counts: if I mount the CDDA first, and then run TP (which does a start/stop_watching() like Tracker) second, the KDL does not occur.
I suppose this might be related to this syslog traffic that occurs with the audio CD loaded: