Opened 3 months ago

Last modified 3 months ago

#15355 new bug

checkfs freezes app_server

Reported by: pulkomandy Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: - General Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

When running checkfs on my work BFS volume (with lots of sourcecode files), it prints a few errors and then freezes the display. activity_monitor stops refreshing, mouse dosn't move. I tried to poke at things in KDL but there everything seems to be normal, checkfs was calling dprintf when I interrupted it but that continued as normal. Maybe it's a bug inTerminal or the tty layer when there is lots of things to print, will try redirecting the output to a file in case it makes a difference.

Change History (14)

comment:1 by pulkomandy, 3 months ago

So, running it with redirect to a textfile led to some strange things:

  • I had another terminal tab with a "watch ls -lh checkfs_output.txt" running. When checkfs terminated, that tab suddenly disappeared
  • In the tab where checkfs was running, I never got my bash prompt back, even though checkfs was not running anymore (not in ps, not in ProcessController or Slayer)

It also wiped about half the files out of my disk. But I heard BFS is safe now and it's not needed to mount disks readonly...

comment:2 by waddlesplash, 3 months ago

Checkfs has always frozen the whole system? It holds some critical locks inside BFS I believe and so basically nothing happens while it's running. I've never heard of BFS deleting files that already exist; but I've also never tried to redirect its output to a file...

comment:3 by X512, 3 months ago

Was checkfs applied to /boot volume or some another volume? If it was applied to /boot volume app_server can freeze by performing read request (including swap file). Was system unfreezed when checkfs finishes?

What was written by checkfs?

I experienced BFS problems some time ago, but not with current versions. "checkfs -c /boot" reported errors that changes with each run. I think it can be caused by incorrect disk cache operation.

comment:4 by pulkomandy, 3 months ago

It was another volume (checkfs on /boot works fine and does not freeze anything)

In the log file I get a lot of "cannot be opened" errors on several directories, and now these are empty on the disk.

Freezing everything is not the expected behavior, and not something I remember happening before. Even without access to that disk, I should still be able to run the system normally. It does not hold any application, it does not hold my swapfile. So there's really no reason to block anything that doesn't affect that disk (such as, I don't know, moving the mouse?)

comment:5 by X512, 3 months ago

It's better to use "checkfs -c" first. checkfs is potentially dangerous operation. If I understand correctly it traverses node tree and remove invalid nodes. So if folder will be considered invalid it and its contents will be deleted (https://xref.plausible.coop/source/xref/haiku/src/add-ons/kernel/file_systems/bfs/CheckVisitor.cpp#351) and child nodes will be not traversed. checkfs don't attempts to recover unreferenced nodes and nodes that "cannot be opened".

"cannot be opened" errors can be caused by problems in disk cache. Reboot can fix that.

BFS is journaling file system and it do not require repair after unexpected shutdown (but I heard that freeing blocks is required sometime). Filesystem errors are caused by kernel, file system drivers bugs and/or disk hardware failures.

comment:6 by pulkomandy, 3 months ago

Thanks, I know how this work. I knew I could do it on this partition where everything is versionned in Git. I'm annoyed at waddlesplash recently having removed the warning and recommendation to mount disks readonly whenever possible (which I usually do on my other partitions to avoid corruption).

As long as we have "problems in the disk cache" or whatever causes this, we should keep this warning. That's all.

comment:7 by waddlesplash, 3 months ago

I removed the warning because nobody had reported any disk corruption that was not explainable by KDLs in quite a long time. So, how many KDLs during disk operations have you hit?

comment:8 by pulkomandy, 3 months ago

Not many recently. I'm mostly getting app_server crashes at the moment. But no matter if it's caused by KDL, I'd really like to have the option to mount disks read-only back, until we are sure there are no data corruptions (KDL or not).

comment:9 by axeld, 3 months ago

I think two things could be responsible for this:

  • The change towards CheckVisitor is relatively young, and may cause new issues.
  • If you couldn't open inodes, they might be removed from the the B+tree if you chose to enable that functionality. Now, if low memory was the reason for not being able to open the nodes, than this sounds like a great opportunity to shredder your file system.

However, I don't see the connection between this error, and the missing r/o mount option in the UI. You can still mount r/o via the terminal if you prefer to do that, anyway.

comment:10 by pulkomandy, 3 months ago

Yes, the r/o mount option is not related to the error. I just like the possibility to mount things r/o easily, knowing that I don't consider checkfs and bfs to be completely safe, yet. And the option was recently removed.

comment:11 by humdinger, 3 months ago

I just like the possibility to mount things r/o easily,

+1 for bringing it back.
At least as long as we're beta, though personally, I like to have that control when mounting anyway. E.g. if I mount my 'production' partition from another test partition just to copy over some files, I always mount read-only, just to be sure I cannot accidentally mess something up (like confusing source and destination because the file structure is identical...).

comment:12 by diver, 3 months ago

I had a situation earlier this year when checkfs froze app_server and I left it like that for several minutes and eventually it came back to life.

comment:13 by axeld, 3 months ago

On the (off-topic for this ticket) discussion about the r/o mount option: I'm fine with having that message removed. How about adding a new option to always ask if a volume should be mounted read-only? It could follow the same logic (ie. one option for BFS, and for other file systems).

comment:14 by humdinger, 3 months ago

+1

In the Tracker preferences, above the "Eject when unmounting" checkbox: "Ask to mount read-only".

Note: See TracTickets for help on using tickets.