Opened 7 years ago

Closed 5 years ago

#9528 closed bug (fixed)

stop_watching() KDLs on page fault, monitoring a CDDA

Reported by: ttcoder Owned by: bga
Priority: normal Milestone: R1
Component: File Systems/cdda Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

This is hrev45325

Occured twice in a row; first when closing a Tracker window showing the audio CD's contents while TP was running, then after rebooting, occured with TP alone, no Tracker interaction (stack trace looked similar).

Searched tickets for "page fault stop_watching" but came up empty, does not seem to be a duplicate.

Attachments (4)

page_panic_with_CDDA.jpeg (1.4 MB ) - added by ttcoder 7 years ago.
on_opening_with_tracker-2014-04-15 15.58.57.jpg (617.2 KB ) - added by ttcoder 6 years ago.
hrev47102, start_watching(), corrupt edx value
on_clicking_start-2014-04-15 16.30.39.jpg (592.3 KB ) - added by ttcoder 6 years ago.
hrev47102, read_cd(), corrupt cd_driver_info* pointer
DSCF6766.JPG (453.1 KB ) - added by ttcoder 5 years ago.
VMCache::InsertPage() fault (cdda config: WithOut cdtext, With open_attributes())

Change History (18)

by ttcoder, 7 years ago

Attachment: page_panic_with_CDDA.jpeg added

comment:1 by ttcoder, 7 years ago

TP stands for TunePrepper BTW.

The sequence of events counts: if I mount the CDDA first, and then run TP (which does a start/stop_watching() like Tracker) second, the KDL does not occur.

I suppose this might be related to this syslog traffic that occurs with the audio CD loaded:

KERN: DMAResource@0x8280a600: low/high 0/100000000, max segment count 512, align 2, boundary 65536, max transfer 524288, max segment size 134217728
KERN: slab memory manager: created area 0xdf801000 (6518)
KERN: slab memory manager: created area 0xf0001000 (6519)
KERN: ata 1 error: command failed, error bit is set. status 0x51, error 0x54
KERN: atapi 1-0 error: device indicates transfer error after dma
KERN: check_sense: Illegal request
KERN: ata 1 error: command failed, error bit is set. status 0x51, error 0x54
KERN: atapi 1-0 error: device indicates transfer error after dma
KERN: check_sense: Illegal request

comment:2 by ttcoder, 7 years ago

Summary: stop_watching() KDLs on page panic, watching a CDDAstop_watching() KDLs on page panic, monitoring a CDDA

comment:3 by ttcoder, 7 years ago

Summary: stop_watching() KDLs on page panic, monitoring a CDDAstop_watching() KDLs on page fault, monitoring a CDDA

comment:4 by anevilyak, 7 years ago

Component: File Systems/BFSSystem/Kernel

by ttcoder, 6 years ago

hrev47102, start_watching(), corrupt edx value

by ttcoder, 6 years ago

hrev47102, read_cd(), corrupt cd_driver_info* pointer

comment:5 by ttcoder, 6 years ago

The above comes from R. who has a knack for triggering this KDL with just a snap of fingers :-)

Gotta crosspost this to #10259 since I've realized the following:

  • the four-bytes faulty addresses in the above KDLs correspond to ascii strings lect and ric , respectively. Could be legit on its face... Except that..:
  • reading the track list of the concerned CD that R. shared with me, there's a match in both cases for track number 15, whose title is Are Friends Electric

I'm going to (try to make time to) scan the cdda source code with that in mind, and make custom builds of it with lots of features disabled (especially features related to track listing and titles), until we find the one memory-corrupting feature. I'm a out of my pool depth but could get lucky..

comment:6 by korli, 6 years ago

I had a look at cdda.cpp, I think the function to_utf8() has a loop problem possibly unrelated to the failure encountered by the OP:

We check that "out == sizeof(buffer) - 1" is false, but later for "c" between 0x80 and 0x800, we increment "out" twice, which might miss the check on the next iteration.

So line 60 should be:

    if (out >= sizeof(buffer) - 2)

or we could add a second check line 66 leading to a loop break.

comment:7 by ttcoder, 5 years ago

It seems I've hit a lucky strike on the first attempt: I've made a first build with disabled read_cdtext() and DetermineName() (which in turn means to_utf8 is not used either) and it didn't crash once yet. Fingers crossed.

Next I'll narrow down the problem and update the ticket. That will probably take a few weeks due to a heavy workload of late.

comment:8 by ttcoder, 5 years ago

Turns out it's not that easy to find the root cause...

If I hardcode the file descriptor attrFD to -1 (so as to disable attribute reading) in Volume::DetermineName() here... http://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/file_systems/cdda/kernel_interface.cpp#n821

I have a rock-solid cdda driver (to note, the other change is that I've commented out all the cd-text stuff too; but contrarily to initial impressions I think the cd-text code is not activated at all, apparently it only serves for rare occurences of audio CDs that have textual strings embedded inside the audio, something which I have never seen and do not have in any of my audio CDs here I don't think).

If however I do not comment out the attrFD and open_attributes() stuff, then Haiku crashes easily... somewhere in code unrelated to the cdda driver (see grab). So my working theories are:

  1. there is some kind of corruption created somewhere in the (big) remainder of the code that I didn't not comment out, e.g. class Attribute, class Volume ..etc.
  2. there is some kind of corruption from the cd-text code (again, that's unlikely by virtue of the crash occuring again now DetermineName() is back and cd-text is STILL commented out, and additionally I would say it's even impossible impossible by virtue of the cd-text code NOT being exercised even if I do not comment it out anyway).
  3. there is some kind of interaction/bug with something else than the cdda driver, be it the VM subsystem or whatnot (an unlikely theory it seems).

Neither of which leads to a speedy resolution.. Anyway I guess I should be happy since I can at least use the driver somewhat reliably now, if I comment out the DetermineName() code :-]

Last edited 5 years ago by ttcoder (previous) (diff)

by ttcoder, 5 years ago

Attachment: DSCF6766.JPG added

VMCache::InsertPage() fault (cdda config: WithOut cdtext, With open_attributes())

comment:9 by ttcoder, 5 years ago

Significant milestone reached -- I now know for sure who the "ascii" buffer belongs to, it is actually the Volume::fName const char*.

Details:

I've modified the Volume dtor to fill out its fName buffer before freeing it, here's the diff:

Volume::~Volume()
{
  ...

+ memset(fName, 'z', strlen(fName));
  free(fName);
}

(the letter z is 0x7a in hex).

And sure enough, the KDL page fault I get from now on is on indirection 0x7a7a7a7a !

I've tried to invoke syslog in KDL, see if something comes up, since my modified driver has some tracing like dprintf("%p\n", fName) ..etc in places but nothing was immediately apparent.

I was hoping to see a clear "use after free" symptom, like somebody calling the deleted Volume, making it write to its (long-since free()d) fName buffer, for example... I have a feeling it's something like that because whenever I mount an audio CD it gets a sequential number, like Audio CD1, Audio CD2 ..etc (or Jogeir Liljedahl1, Jogeir Liljedahl2 ..etc), as if somebody is still keeping a handle on the mounted CD even after I eject it and cdda_unmount() is called, which in turn calls 'delete volume'.. but didn't find anything.

I'm now going to comment out the free() (i.e. leak the memory), see if it "fixes" the driver. Failing that I'll try to outright comment out the 'delete volume', though that would represent a bigger memory leak..

EDIT: what's weird is that I've also tried adding the following:

    free(fName);
+fName = (char*)0xdead3003;
}

yet that changes absolutely nothing, still get a KDL page fault on 0x7a7a7a7a .. I suppose it means either

  1. the fName pointer is copied.. But I see that mistake nowhere in the cdda code (and I didn't find evidence either of class Volume being copied via its bitwise copy constructor, which is also a common coding mistake)..
  2. cdda is actually an innocent bystander, and somebody else is messing with its legitimately owned fName memory buffer?
  3. something else?
Last edited 5 years ago by ttcoder (previous) (diff)

comment:10 by korli, 5 years ago

It seems to me that the file descriptor opened in Volume::Mount() here isn't later closed, thus leaked. It should happen there. It was introduced in hrev27109.

comment:11 by ttcoder, 5 years ago

Good catch Jerome, one version of _RestoreAttributes() invokes close() whereas the other.. does not. Yet the calling code in Mount() did not compensate for that change. I guess it probably cannot result in a KDL page fault though.. But I'll include the missing close(fd) call in whatever patch I submit later (if I'm lucky to get to that stage :-) as leaking file descriptors is probably serious (dunno if there's resource tracking going on when the cdda addon is unloaded)..

EDIT: oh, maybe that file descriptor leak explains why an unmounted audio CD is still referred to after being unmounted ? Not sure how that works.

EDIT2: a late edit, for the record.. The fact that there is still an /AudioCD mountpoint in vfs even after ejecting a CD (with the physical button) is due to this: https://dev.haiku-os.org/ticket/3026#comment:1 which seems pretty harmless.

Last edited 5 years ago by ttcoder (previous) (diff)

comment:12 by korli, 5 years ago

Applied a fix for the descriptor leak in hrev47477 (along with fixes from Augustin).

comment:13 by pulkomandy, 5 years ago

Component: System/KernelFile Systems/cdda
Owner: changed from axeld to bga

comment:14 by mmlr, 5 years ago

Resolution: fixed
Status: newclosed

This one fits perfectly with hrev48946 so I'm going to close this as well. Please reopen if you can still reproduce.

Note: See TracTickets for help on using tickets.