Opened 8 months ago

Last modified 7 weeks ago

#18885 new bug

panic: bounce buffer already in use!

Reported by: davidkaroly Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: Drivers/Network Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #19130
Platform: All

Description (last modified by davidkaroly)

Got this panic while building llvm18. It seems to happen after heavy usage of the file system, with lots of small files.

Steps to reproduce looks something like this:

  1. build llvm18 with haikuporter
  2. delete the work directory: rm -rf work-18.1.2
  3. start the build again.
  4. we get a panic

Screenshot: tbd, unfortunately I didn't create one. It will take some time to trigger this again.

I saw this on recent hrevs e.g. hrev57667 in Hyper-V. Seems to happen both on 32-bit and 64-bit.

Attachments (2)

panic.png (126.1 KB ) - added by davidkaroly 7 months ago.
screenshot
screenshot2.png (92.0 KB ) - added by davidkaroly 7 months ago.

Download all attachments as: .zip

Change History (14)

comment:1 by waddlesplash, 8 months ago

Component: System/KernelDrivers/Network
Priority: highnormal

Seems odd this would happen on heavy filesystem usage, because the message comes from the DMA code in the FreeBSD network compatibility layer.

comment:2 by davidkaroly, 8 months ago

Description: modified (diff)

ehh sorry I stand corrected, it happened in Hyper-V, not VMware.

Anyway, probably I also deleted the download folder so the panic could have happened during re-downloading the source tarball.

idk does this make more sense like this? e.g. the file system using some kind of buffers and therefore at a later point the network stack runs out of resources?

Version 0, edited 8 months ago by davidkaroly (next)

comment:3 by waddlesplash, 8 months ago

The panic occurs when a driver tries to load a network buffer into a bounce buffer that already has a different network buffer in it. Maybe something about memory pressure could bring that on, but if it does, there's still a bug elsewhere in that drivers shouldn't ever try to do that.

comment:4 by davidkaroly, 7 months ago

I re-tested on hrev57708. The issue cannot be reproduced on x86_64.

I was able to reproduce it on x86 though. After deleting the old working dir for llvm build (lots of small files!), I get the panic when trying to download the new tarball.

See attached screenshot.

by davidkaroly, 7 months ago

Attachment: panic.png added

screenshot

comment:5 by davidkaroly, 7 months ago

is it possible that the filesystem takes up the buffers and doesn't release them so then in the next step the network driver runs out of buffers? (i'm just guessing, really not familiar with that part of the kernel)

comment:6 by waddlesplash, 7 months ago

No, that's not what the message means. In the FreeBSD bus-dma APIs, bounce buffers have other buffers "loaded into" them; then you send the bounce buffer to the hardware, and when the IO is done you "unload" the buffer you loaded in (and then you can reuse the bounce buffer.) This panic triggers when you try to load some buffer into a bounce buffer that is currently in use and hasn't yet had its buffer "unloaded".

I'm not sure FreeBSD has any equivalent sanity check here, so I think it's possible this is a driver bug that gets caught on Haiku but is silently missed on FreeBSD.

Looking at the logic in tulip_txput, though, I'm not sure how this happens. It first checks if there are any free descriptors, and if there aren't, it calls tulip_tx_intr, which in turn calls tulip_dequeue_mbuf (sometimes indirectly through other functions), which in turn calls bus_dmamap_unload (which is what "unloads" network buffers from bounce buffers.) If we don't wind up with any free descriptors, though, txput just bails without invoking load_mbuf_sg.

There isn't any way for bus_dmamap_unload to fail, so that can't be the problem here. Maybe somewhere in this convoluted logic there's a way for txput to return > 1 without actually having freed the buffer, but if this happens under high-load/high-memory-usage conditions, I don't know what that would be.

I do notice there are some "ifdef i386" in the driver which wouldn't be the case on x86_64. You might try disabling some of those and seeing if that changes the behavior on x86.

comment:7 by davidkaroly, 7 months ago

The panic happened again, this time on x86_64 hrev 57728 - so at least we know it's not specific to the 32-bit build. This is still on Hyper-V.

This time the sequence was the following:

  • clean up haikuports build folder
  • fetch master from upstream

This is basically the same pattern as before: heavy filesystem access with lots of small files, followed by network access.

by davidkaroly, 7 months ago

Attachment: screenshot2.png added

comment:8 by waddlesplash, 7 months ago

Is there any chance this is memory corruption, somehow?

comment:9 by pulkomandy, 7 months ago

I had a somewhat similar crash yesterday while compiling webkit (so, similar pattern, lots of disk/cpu use and not much network activity). In my case it's the ipro1000 driver.

comment:10 by waddlesplash, 5 months ago

Adjusted the panic message in hrev57791, please retest. The value should only ever be 2 or 3, if it's anything else that will indicate memory corruption. (I also updated the FreeBSD drivers so it's possible something different will happen anyway.)

comment:11 by waddlesplash, 7 weeks ago

Blocking: 19130 added

comment:12 by waddlesplash, 7 weeks ago

The new panic message gives "type 3" it seems, which is TYPE_MBUF. So it's really an mbuf loaded into the bounce buffer, and not bogus data or something else. This seems like some bug in the driver then.

Note: See TracTickets for help on using tickets.