Opened 8 years ago

Last modified 3 years ago

#12698 new bug

Strange checkfs behavior

Reported by: X512 Owned by: nobody
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: checkfs cache Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev50152.

When running checkfs -c sometimes I get inconsistent results such as different count of checked nodes, blocks that could be freed and others. Also sometimes checkfs reports that some random files that "could not be opened". List of files that could not be opened is different after each call of checkfs -c and sometimes empty. In theory checkfs -c should not change anything, but check results of consecutive checkfs -c calls are different. When checkfs returns inconsistent results, consecutive calls are not fast as usual (for example usually second call of checkfs can be done for about 3 seconds while first was about 10 minutes, node count is about 300000). Running checkfs from boot usb immidiately after boot don't cause this issue.

This is probably caused by error in disk block caching for reading.

Sample checkfs result:

> checkfs -c /boot
haiku_unsupported-nightly-hrev50163-x86_hybrid-raw (inode = 6815673), could not be opened
        259418 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        307218 blocks could be freed

        files           235130
        directories     23178
        attributes      704
        attr. dirs      366
        indices         40

        direct block runs               255611 (40.90 GiB)
        indirect block runs             809 (in 44 array blocks, 15.78 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 バイト)

checkfs result from live USB:

> checkfs -c /Haiku1
        259416 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        7 blocks could be freed

        files           235128
        directories     23178
        attributes      704
        attr. dirs      366
        indices         40

        direct block runs               255600 (41.38 GiB)
        indirect block runs             798 (in 41 array blocks, 15.30 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 バイト)

Note "blocks could be freed" value.

Attachments (2)

CIMG4615_2.JPG (1.7 MB ) - added by X512 4 years ago.
KDL when running checkfs -c
syslog (465.2 KB ) - added by X512 4 years ago.
syslog

Change History (20)

comment:1 by X512, 8 years ago

Running checkfs without "-c" can be dangerous because it can mark used blocks as free and allocate data in already used blocks causing data corruption.

comment:2 by pulkomandy, 8 years ago

Is this on a partition mounted read-only? Since checkfs works without unmounting the partition, it could simply be regular filesystem activity going on while it runs.

Can you reproduce this on a read-only volume as well?

The second and following calls being faster can be explained by everything (or almost) being cached to RAM by the first run.

comment:3 by X512, 8 years ago

Partition was not mount read only because it was boot partition when issue happens. But I didn't write anything between checks. When I tried to check same partition from live USB no errors except 7 block that could be freed. Also result was consistent.

Issue seems to be often reproducible.

comment:4 by X512, 8 years ago

Is it possible to completely disable disk cache?

comment:5 by axeld, 7 years ago

The disk cache cannot be disabled. With "from live USB" you mean that you've booted Haiku from USB? Without further info, I would assume that the actual cause is memory or rather, the lack of it, ie. I'd assume that this is a duplicate of #10312.

Can you still reproduce it? And if you can, can you please provide a syslog?

comment:6 by X512, 4 years ago

Still present in hrev53917.

When I run checkfs -c, a lot "could not be opened" errors were written and then KDL appear. Then I run checkfs from different installation on USB3 disk and checkfs -c completed without bad files:

checkfs -c /Haiku
        897997 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        130 blocks could be freed

        files           820100
        directories     73303
        attributes      2809
        attr. dirs      1745
        indices         40

        direct block runs               900818 (45.75 GiB)
        indirect block runs             3651 (in 224 array blocks, 7.97 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 bytes)

by X512, 4 years ago

Attachment: CIMG4615_2.JPG added

KDL when running checkfs -c

comment:7 by X512, 4 years ago

Disk where checkfs is called is SATA HDD if it matters.

by X512, 4 years ago

Attachment: syslog added

syslog

comment:8 by diver, 4 years ago

Component: System/KernelFile Systems/BFS

comment:9 by X512, 4 years ago

I don't think that it is BFS driver problem, it looks like disk cache problem that is implemented in kernel itself. Reading from cache cause reading garbage sometimes. Crash in bplustree_node::CheckIntegrity is a separate problem similar to #15607, it do not handle invalid data correctly.

comment:10 by diver, 4 years ago

Component: File Systems/BFSSystem/Kernel
Owner: changed from axeld to nobody

Moving component back.

comment:11 by waddlesplash, 4 years ago

I had a case of FS corruption while testing nvme_disk that, at first, reading from a file worked file; but then after doing a partial checkfs and then interrupting it after ~20k or so nodes, and then checksumming the first file again, the checksum was then different. More than that, it was different *every successive time* I read from the file.

I rebooted and was able to reproduce this multiple times, but then after deleting the files, redownloading, and running a full checkfs, it did not occur again.

comment:12 by X512, 4 years ago

In my case this bug has higher probability to trigger in 32 bit, than in 64 bit version of Haiku.

comment:13 by waddlesplash, 4 years ago

Potentially related: #15912.

comment:14 by waddlesplash, 4 years ago

#15912 was fixed in hrev54101, which indeed was some pretty bad brokenness in do_iterative_fd_io. See if this problem still occurs after that.

comment:15 by X512, 3 years ago

Still sometimes happens on hrev54942 x86_64.

comment:16 by X512, 3 years ago

~/Desktop> checkfs -c /Haiku
        1483898 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        486479 blocks could be freed

        files           1370075
        directories     107883
        attributes      3620
        attr. dirs      2293
        indices         27

        direct block runs               1433742 (57.18 GiB)
        indirect block runs             5594 (in 197 array blocks, 11.24 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 bytes)
~/Desktop> checkfs -c /Haiku
pack (inode = 25831660), could not be opened
desktop-applets.html (inode = 9264613), could not be opened
        1483602 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        487349 blocks could be freed

        files           1369781
        directories     107881
        attributes      3620
        attr. dirs      2293
        indices         27

        direct block runs               1433440 (57.18 GiB)
        indirect block runs             5594 (in 197 array blocks, 11.24 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 bytes)

Nothing was changed between checkfs runs. Some kernel data structures were corrupted after checkfs run so some random nodes "could not be opened".

comment:17 by X512, 3 years ago

3rd run: OK

~/Desktop> checkfs -c /Haiku
        1483898 nodes checked,
        0 blocks not allocated,
        0 blocks already set,
        486479 blocks could be freed

        files           1370075
        directories     107883
        attributes      3620
        attr. dirs      2293
        indices         27

        direct block runs               1433742 (57.18 GiB)
        indirect block runs             5594 (in 197 array blocks, 11.24 GiB)
        double indirect block runs      0 (in 0 array blocks, 0 bytes)

comment:18 by X512, 3 years ago

Some suspicious output in syslog:

KERN: low resource memory: note -> normal
KERN: low resource memory: normal -> note
KERN: slab memory manager: created area 0xffffff816d001000 (14997)
KERN: slab memory manager: created area 0xffffff816d801000 (14998)
KERN: slab memory manager: created area 0xffffff816e001000 (14999)
KERN: slab memory manager: created area 0xffffff816e801000 (15000)
KERN: slab memory manager: created area 0xffffff816f001000 (15001)
KERN: vnode 6:25831949 is not becoming unbusy!
KERN: bfs: KERN: Could not open inode at 25831949: Device/File/Resource busy
KERN: slab memory manager: created area 0xffffff816f801000 (15002)
KERN: slab memory manager: created area 0xffffff8170001000 (15003)
KERN: slab memory manager: created area 0xffffff8170801000 (15004)
KERN: slab memory manager: created area 0xffffff8171001000 (15005)
KERN: slab memory manager: created area 0xffffff8171801000 (15006)
KERN: slab memory manager: created area 0xffffff8172001000 (15007)
KERN: slab memory manager: created area 0xffffff8172801000 (15008)
KERN: slab memory manager: created area 0xffffff8173001000 (15009)
KERN: slab memory manager: created area 0xffffff8173801000 (15010)
KERN: slab memory manager: created area 0xffffff8174001000 (15011)
KERN: slab memory manager: created area 0xffffff8174801000 (15012)
KERN: slab memory manager: created area 0xffffff8175001000 (15013)
KERN: vnode 6:9792735 is not becoming unbusy!
KERN: bfs: Could not open inode at 9792735: Device/File/Resource busy
KERN: low resource memory: note -> normal
KERN: acquire_advisory_lock(vnode = 0xffffff808d820c00, flock = 0xffffffff99967e90, wait = no)
KERN: Last message repeated 21 times.
KERN: Last message repeated 14 times.
Note: See TracTickets for help on using tickets.