Opened 9 years ago
Last modified 2 weeks ago
#12698 new bug
Strange checkfs behavior
Reported by: | X512 | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | File Systems/BFS | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This is hrev50152.
When running checkfs -c sometimes I get inconsistent results such as different count of checked nodes, blocks that could be freed and others. Also sometimes checkfs reports that some random files that "could not be opened". List of files that could not be opened is different after each call of checkfs -c and sometimes empty. In theory checkfs -c should not change anything, but check results of consecutive checkfs -c calls are different. When checkfs returns inconsistent results, consecutive calls are not fast as usual (for example usually second call of checkfs can be done for about 3 seconds while first was about 10 minutes, node count is about 300000). Running checkfs from boot usb immidiately after boot don't cause this issue.
This is probably caused by error in disk block caching for reading.
Sample checkfs result:
> checkfs -c /boot haiku_unsupported-nightly-hrev50163-x86_hybrid-raw (inode = 6815673), could not be opened 259418 nodes checked, 0 blocks not allocated, 0 blocks already set, 307218 blocks could be freed files 235130 directories 23178 attributes 704 attr. dirs 366 indices 40 direct block runs 255611 (40.90 GiB) indirect block runs 809 (in 44 array blocks, 15.78 GiB) double indirect block runs 0 (in 0 array blocks, 0 バイト)
checkfs result from live USB:
> checkfs -c /Haiku1 259416 nodes checked, 0 blocks not allocated, 0 blocks already set, 7 blocks could be freed files 235128 directories 23178 attributes 704 attr. dirs 366 indices 40 direct block runs 255600 (41.38 GiB) indirect block runs 798 (in 41 array blocks, 15.30 GiB) double indirect block runs 0 (in 0 array blocks, 0 バイト)
Note "blocks could be freed" value.
Attachments (2)
Change History (21)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Is this on a partition mounted read-only? Since checkfs works without unmounting the partition, it could simply be regular filesystem activity going on while it runs.
Can you reproduce this on a read-only volume as well?
The second and following calls being faster can be explained by everything (or almost) being cached to RAM by the first run.
comment:3 by , 9 years ago
Partition was not mount read only because it was boot partition when issue happens. But I didn't write anything between checks. When I tried to check same partition from live USB no errors except 7 block that could be freed. Also result was consistent.
Issue seems to be often reproducible.
comment:5 by , 8 years ago
The disk cache cannot be disabled. With "from live USB" you mean that you've booted Haiku from USB? Without further info, I would assume that the actual cause is memory or rather, the lack of it, ie. I'd assume that this is a duplicate of #10312.
Can you still reproduce it? And if you can, can you please provide a syslog?
comment:6 by , 5 years ago
Still present in hrev53917.
When I run checkfs -c
, a lot "could not be opened" errors were written and then KDL appear. Then I run checkfs from different installation on USB3 disk and checkfs -c
completed without bad files:
checkfs -c /Haiku 897997 nodes checked, 0 blocks not allocated, 0 blocks already set, 130 blocks could be freed files 820100 directories 73303 attributes 2809 attr. dirs 1745 indices 40 direct block runs 900818 (45.75 GiB) indirect block runs 3651 (in 224 array blocks, 7.97 GiB) double indirect block runs 0 (in 0 array blocks, 0 bytes)
comment:8 by , 5 years ago
Component: | System/Kernel → File Systems/BFS |
---|
comment:9 by , 5 years ago
I don't think that it is BFS driver problem, it looks like disk cache problem that is implemented in kernel itself. Reading from cache cause reading garbage sometimes. Crash in bplustree_node::CheckIntegrity
is a separate problem similar to #15607, it do not handle invalid data correctly.
comment:10 by , 5 years ago
Component: | File Systems/BFS → System/Kernel |
---|---|
Owner: | changed from | to
Moving component back.
comment:11 by , 5 years ago
I had a case of FS corruption while testing nvme_disk that, at first, reading from a file worked file; but then after doing a partial checkfs and then interrupting it after ~20k or so nodes, and then checksumming the first file again, the checksum was then different. More than that, it was different *every successive time* I read from the file.
I rebooted and was able to reproduce this multiple times, but then after deleting the files, redownloading, and running a full checkfs, it did not occur again.
comment:12 by , 5 years ago
In my case this bug has higher probability to trigger in 32 bit, than in 64 bit version of Haiku.
comment:14 by , 5 years ago
comment:16 by , 3 years ago
~/Desktop> checkfs -c /Haiku 1483898 nodes checked, 0 blocks not allocated, 0 blocks already set, 486479 blocks could be freed files 1370075 directories 107883 attributes 3620 attr. dirs 2293 indices 27 direct block runs 1433742 (57.18 GiB) indirect block runs 5594 (in 197 array blocks, 11.24 GiB) double indirect block runs 0 (in 0 array blocks, 0 bytes) ~/Desktop> checkfs -c /Haiku pack (inode = 25831660), could not be opened desktop-applets.html (inode = 9264613), could not be opened 1483602 nodes checked, 0 blocks not allocated, 0 blocks already set, 487349 blocks could be freed files 1369781 directories 107881 attributes 3620 attr. dirs 2293 indices 27 direct block runs 1433440 (57.18 GiB) indirect block runs 5594 (in 197 array blocks, 11.24 GiB) double indirect block runs 0 (in 0 array blocks, 0 bytes)
Nothing was changed between checkfs
runs. Some kernel data structures were corrupted after checkfs
run so some random nodes "could not be opened".
comment:17 by , 3 years ago
3rd run: OK
~/Desktop> checkfs -c /Haiku 1483898 nodes checked, 0 blocks not allocated, 0 blocks already set, 486479 blocks could be freed files 1370075 directories 107883 attributes 3620 attr. dirs 2293 indices 27 direct block runs 1433742 (57.18 GiB) indirect block runs 5594 (in 197 array blocks, 11.24 GiB) double indirect block runs 0 (in 0 array blocks, 0 bytes)
comment:18 by , 3 years ago
Some suspicious output in syslog:
KERN: low resource memory: note -> normal KERN: low resource memory: normal -> note KERN: slab memory manager: created area 0xffffff816d001000 (14997) KERN: slab memory manager: created area 0xffffff816d801000 (14998) KERN: slab memory manager: created area 0xffffff816e001000 (14999) KERN: slab memory manager: created area 0xffffff816e801000 (15000) KERN: slab memory manager: created area 0xffffff816f001000 (15001) KERN: vnode 6:25831949 is not becoming unbusy! KERN: bfs: KERN: Could not open inode at 25831949: Device/File/Resource busy KERN: slab memory manager: created area 0xffffff816f801000 (15002) KERN: slab memory manager: created area 0xffffff8170001000 (15003) KERN: slab memory manager: created area 0xffffff8170801000 (15004) KERN: slab memory manager: created area 0xffffff8171001000 (15005) KERN: slab memory manager: created area 0xffffff8171801000 (15006) KERN: slab memory manager: created area 0xffffff8172001000 (15007) KERN: slab memory manager: created area 0xffffff8172801000 (15008) KERN: slab memory manager: created area 0xffffff8173001000 (15009) KERN: slab memory manager: created area 0xffffff8173801000 (15010) KERN: slab memory manager: created area 0xffffff8174001000 (15011) KERN: slab memory manager: created area 0xffffff8174801000 (15012) KERN: slab memory manager: created area 0xffffff8175001000 (15013) KERN: vnode 6:9792735 is not becoming unbusy! KERN: bfs: Could not open inode at 9792735: Device/File/Resource busy KERN: low resource memory: note -> normal KERN: acquire_advisory_lock(vnode = 0xffffff808d820c00, flock = 0xffffffff99967e90, wait = no) KERN: Last message repeated 21 times. KERN: Last message repeated 14 times.
comment:19 by , 2 weeks ago
Component: | System/Kernel → File Systems/BFS |
---|---|
Keywords: | checkfs cache removed |
Owner: | changed from | to
Running checkfs without "-c" can be dangerous because it can mark used blocks as free and allocate data in already used blocks causing data corruption.