Opened 11 years ago

Closed 10 years ago

#3264 closed bug (fixed)

KDL when checkfs hits 31800 nodes processed

Reported by: mmadia Owned by: axeld
Priority: normal Milestone: R1
Component: File Systems/BFS Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Haiku will reliably KDL while running checkfs -c on a partition with many files (haiku svn, mozilla cvs, and numerous library sources). Checkfs reaches 31800 nodes process before the system crashes.

Tested on two fairly recent revisions, running natively. System has 2gb ram and process controller reports very little memory usage.

This might be similar or a dupe of ticket #2718

Might also be related to changeset 23318

Attachments (2)

DSCN0042.jpg (88.6 KB ) - added by mmadia 11 years ago.
camera shot of kdl, larger version available
checkfs_syslog.txt (12.4 KB ) - added by aldeck 10 years ago.
end of boot + checkfs

Download all attachments as: .zip

Change History (8)

by mmadia, 11 years ago

Attachment: DSCN0042.jpg added

camera shot of kdl, larger version available

comment:1 by axeld, 11 years ago

I have a hundred thousand files more on my disk, but I haven't seen this one yet (and those seem to easily fit into the cache, seeing how fast that goes).

Is this reproducible right after start?

comment:2 by mmadia, 11 years ago

--Is this reproducible right after start?--

yup. At any point that I want this KDL to occur, I can make it happen.

Just now, I tried ls -alR /volume_name | wc -l to get a rough estimate of the total number of files on the volume. After running for a bit, a similar KDL occured. The main PANIC was the same, however bt reported slightly different output.

So I broke it down into seperate commands for each top-level directory:


~> ls -alR /stuff-a/torrentz/ | wc -l
23
~> ls -alR /stuff-a/new-stuff/ | wc -l
1968
~> ls -alR /stuff-a/home | wc -l
16
~> ls -alR /stuff-a/fromStorage/ | wc -l
72072
~> ls -alR /stuff-a/binaries/ | wc -l
10
~> ls -alR /stuff-a/backup/ | wc -l
27479
~> ls -alR /stuff-a/_code/ | wc -l
420311
~> ls -al /stuff-a/
total 24


Here's some more information about that partition: The partition was initialized in Haiku with 2k sectors ~> df /stuff-a/

Device No.: 4 Mounted at: /stuff-a

Volume Name: "stuff-a" File System: bfs

Device: /dev/disk/ata/0/master/2

Flags: QAM-P-W

I/O Size: 64.0K (65536 byte)

Block Size: 2.0K (2048 byte)

Total Blocks: 32.8G (17196709 blocks)

Free Blocks: 21.8G (11413292 blocks) Total Nodes: 0

Free Nodes: 0 Root Inode: 524288

comment:3 by mmadia, 11 years ago

I've re-created this error.

For a while I had concerns that these errors were due to physical defects in my drives. Running SeaTools v1.09, reports no error in either 'short' or 'long' tests. So for now, I'm convinced the error is software based.

Using a freshly initialized ~430gb bfs partition as the destination. Using numerous 100~160gb drives, each with 3~4 bfs partitions from my R5 system as the source.

Prior to copyattr -d -r -v /source/* dest/, I ran checkfs on each source numerous times, manually deleting files that were reported, then re-running checkfs to ensure the a clean FS.

During the transfer of the 3rd volume, copyattr KDL'd After rebooting, I removed the entire directory that was being copied at the time of the crash. I then ran checkfs /dest and at which point this tickets KDL said hello.

What information can I gather about this partition for you?

comment:4 by mmadia, 10 years ago

Resolution: fixed
Status: newclosed

Since then I've re-initialized the partition.

comment:5 by aldeck, 10 years ago

Resolution: fixed
Status: closedreopened
Version: R1/pre-alpha1R1/Development

Already happened to me reliably in the past, and i just hit it again today running checkfs after seeing some suspicious output on syslog right after booting. See following syslog which also includes the bfs errors of the end of boot. Crashed at 166850 nodes, partition initialized a week ago, hrev35558.

by aldeck, 10 years ago

Attachment: checkfs_syslog.txt added

end of boot + checkfs

comment:6 by axeld, 10 years ago

Component: System/KernelFile Systems/BFS
Resolution: fixed
Status: reopenedclosed

hrev35743 should fix the KDL. I have no idea about why this happens, though; even if the memory has been paged out, it should have been resolvable. And checkfs doesn't seem to have crashed either.

Note: See TracTickets for help on using tickets.