Ticket #3264 (closed bug: fixed)

Opened 15 months ago

Last modified 2 weeks ago

KDL when checkfs hits 31800 nodes processed

Reported by: mmadia Owned by: axeld
Priority: normal Milestone: R1
Component: File Systems/BFS Version: R1/Development
Keywords: Cc:
Blocked By: Platform: All
Blocking:

Description

Haiku will reliably KDL while running checkfs -c on a partition with many files (haiku svn, mozilla cvs, and numerous library sources). Checkfs reaches 31800 nodes process before the system crashes.

Tested on two fairly recent revisions, running natively. System has 2gb ram and process controller reports very little memory usage.

This might be similar or a dupe of ticket #2718

Might also be related to changeset 23318

Attachments

DSCN0042.jpg Download (88.6 KB) - added by mmadia 15 months ago.
camera shot of kdl, larger version available
checkfs_syslog.txt Download (12.4 KB) - added by aldeck 4 weeks ago.
end of boot + checkfs

Change History

Changed 15 months ago by mmadia

camera shot of kdl, larger version available

Changed 15 months ago by axeld

I have a hundred thousand files more on my disk, but I haven't seen this one yet (and those seem to easily fit into the cache, seeing how fast that goes).

Is this reproducible right after start?

Changed 15 months ago by mmadia

--Is this reproducible right after start?--

yup. At any point that I want this KDL to occur, I can make it happen.

Just now, I tried ls -alR /volume_name | wc -l to get a rough estimate of the total number of files on the volume. After running for a bit, a similar KDL occured. The main PANIC was the same, however bt reported slightly different output.

So I broke it down into seperate commands for each top-level directory:


~> ls -alR /stuff-a/torrentz/ | wc -l
23
~> ls -alR /stuff-a/new-stuff/ | wc -l
1968
~> ls -alR /stuff-a/home | wc -l
16
~> ls -alR /stuff-a/fromStorage/ | wc -l
72072
~> ls -alR /stuff-a/binaries/ | wc -l
10
~> ls -alR /stuff-a/backup/ | wc -l
27479
~> ls -alR /stuff-a/_code/ | wc -l
420311
~> ls -al /stuff-a/
total 24


Here's some more information about that partition: The partition was initialized in Haiku with 2k sectors ~> df /stuff-a/

Device No.: 4 Mounted at: /stuff-a

Volume Name: "stuff-a" File System: bfs

Device: /dev/disk/ata/0/master/2

Flags: QAM-P-W

I/O Size: 64.0K (65536 byte)

Block Size: 2.0K (2048 byte)

Total Blocks: 32.8G (17196709 blocks)

Free Blocks: 21.8G (11413292 blocks) Total Nodes: 0

Free Nodes: 0 Root Inode: 524288

Changed 14 months ago by mmadia

I've re-created this error.

For a while I had concerns that these errors were due to physical defects in my drives. Running SeaTools v1.09, reports no error in either 'short' or 'long' tests. So for now, I'm convinced the error is software based.

Using a freshly initialized ~430gb bfs partition as the destination. Using numerous 100~160gb drives, each with 3~4 bfs partitions from my R5 system as the source.

Prior to copyattr -d -r -v /source/* dest/, I ran checkfs on each source numerous times, manually deleting files that were reported, then re-running checkfs to ensure the a clean FS.

During the transfer of the 3rd volume, copyattr KDL'd After rebooting, I removed the entire directory that was being copied at the time of the crash. I then ran checkfs /dest and at which point this tickets KDL said hello.

What information can I gather about this partition for you?

Changed 4 weeks ago by mmadia

  • status changed from new to closed
  • resolution set to fixed

Since then I've re-initialized the partition.

Changed 4 weeks ago by aldeck

  • status changed from closed to reopened
  • version changed from R1/pre-alpha1 to R1/Development
  • resolution fixed deleted

Already happened to me reliably in the past, and i just hit it again today running checkfs after seeing some suspicious output on syslog right after booting. See following syslog which also includes the bfs errors of the end of boot. Crashed at 166850 nodes, partition initialized a week ago, r35558.

Changed 4 weeks ago by aldeck

end of boot + checkfs

Changed 2 weeks ago by axeld

  • status changed from reopened to closed
  • resolution set to fixed
  • component changed from System/Kernel to File Systems/BFS

r35743 should fix the KDL. I have no idea about why this happens, though; even if the memory has been paged out, it should have been resolvable. And checkfs doesn't seem to have crashed either.

Note: See TracTickets for help on using tickets.