Opened 6 months ago

Last modified 8 weeks ago

#14964 new bug

kdl: last transaction still open

Reported by: ttcoder Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: File Systems/BFS Version: R1/Development
Keywords: Cc: dsuden
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Occurs for dsuden, see below for bt. Not sure whether to file under "BFS" or something else.

Couldn't fine a way to make this reproducible on any of my computers here.

Attachments (1)

kdl_last_trans_still_open.jpg (153.3 KB ) - added by ttcoder 6 months ago.

Download all attachments as: .zip

Change History (11)

by ttcoder, 6 months ago

comment:1 by waddlesplash, 6 months ago

Please get the following:

  1. A syslog from this machine just after boot.
  2. The output of the listdev and listimage commands (once whatever operation usually triggers this has been started)
  3. A picture of the output of syslog | tail 15 run at the KDL prompt.

comment:2 by ttcoder, 6 months ago

Cc: dsuden added

Forwarding/Cc'ing this to Dane, thanks

EDIT: forgot to mention -- got word a few days ago that waddlesplash and Dane are working on this off-ticket, trying to find a way to diagnose/fix this

Last edited 6 months ago by ttcoder (previous) (diff)

comment:3 by waddlesplash, 2 months ago

Ping -- any update on getting those 3 things?

comment:4 by ttcoder, 2 months ago

@dane I remember being Cc'ed on your discussing sending a reproducible case in snail mail.. Is your offer to send a USB stick with CC6 to waddlesplash still valid? Might be a way to spare you the chore of getting the nitty-gritty under-the-hood stuff :-)

Last edited 2 months ago by ttcoder (previous) (diff)

comment:5 by waddlesplash, 2 months ago

Except, ttcoder, you were unable to reproduce this, yes? IIRC quite literally nobody but Dane has seen this one. So I don't know why sending me a drive would help here...

comment:6 by ttcoder, 2 months ago

All my "test lab" here consists of ten-year old computers.

All our problems (this KDL and others) started with hardware from recent years. Not sure what Dane is (was) using last for assembling station rigs, possibly still Asus/AMD combos as before.

comment:7 by waddlesplash, 2 months ago

So, probonopd ran into this booting off USB (the second person ever to do so?) and ran the syslog+tail command and discovered that there were a ton of failing writes on the part of usb_disk. So I highly suspect the same is true here; that BFS is failing to write back the journal and this is its way of notifying you :)

Once again, the syslog information will confirm this. But if it's the case it will be much easier to find a solution. So this is not a problem in our block cache after all.

comment:8 by ttcoder, 2 months ago

Well it would be ironic if this ticket was fixed in a roundabout way, thanks to data provided by another user! (just to be clear & make sure I understand, we are talking of a bug that is not specific only to USB? Dane's KDLs occur with AHCI, maybe set to IDE legacy mode, no USB mass storage involved I think). I'm trying to get ahold of Dane to test some other stuff, should converse with him about this too once in touch.

comment:9 by waddlesplash, 2 months ago

Presumably it would occur whenever the journal hits an error writing back to disk.

Uh ... why are you using "IDE legacy mode"? That is not that reliable in BIOSes these days, and Haiku's AHCI support is more than stable.

comment:10 by ttcoder, 2 months ago

I've just reverted my own test 'puter to AHCI. Uses 52000 from last august so the old boot problems should be gone, I'm seeing no regression so far. (ten years ago we had problems where Haiku would be "stuck" at boot in 10 or 20% of boot-ups, and that problem would be gone in IDE emulation mode).

In touch with Dane, will tell him to do the same -- I can see how this could make a big difference, if it ever turns out that our BFS corruptions problems are even remotely related to buggy IDE emulation. That and get a cheap PS/2 keyboard on ebay to enter the tail/syslog command in KDL.

-EDIT-: done, will keep the ticket posted if I hear back from Dane.

-EDIT2-: waddlesplash and Dane did some tweaking of the AMD setup, of IDE/AHCI, of haiku hrev, of HDA, and CC6 has been running for 3 days without a KDL, first time ever it lasts more than 24 hours! Knocking on wood..

Last edited 8 weeks ago by ttcoder (previous) (diff)
Note: See TracTickets for help on using tickets.