Opened 4 years ago
Closed 3 months ago
#16159 closed bug (no change required)
PANIC: Invalid concurrent access to page... (start), ... accessed by: -134217729 [bt: file_cache]
Reported by: | ttcoder | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
One of our beta-testing stations has been getting a string of KDLs when running a backup script (script based on /bin/cp, copying mp3s from one BFS partition to another), here's the latest KDL.
Contrarily to most other tickets like this one, the backtrace mentions file_cache_read
Backtrace, context, similar tickets list, coming up
Attachments (2)
Change History (15)
comment:2 by , 4 years ago
Doing ticket archeology, I see that #5228 #5242 #5544 #5919 have all been closed after VM fixes ; similar message, but different backtrace (vm instead of file_cache) Stuff also happens when Tuneprepper fork()s : #12518 #12408 (but we're no longer selling that app, so could close those tickets as invalid).
Recent active tickets are #13274, and especially #13205 (that latter one is in file_cache_write()
, so closest to this ticket?)
Notes to self:
- this occurs with hrev53894.. Upgrade this station (at least) to beta2 when it's out ? I see there's been some fixes done to the file cache recently. And refrain from doing backups in the meantime. Not much longer to wait anyway.
- if this still occurs with beta2, ping this ticket to get it looked into in a few weeks ?
- that stations also has BFS errors reported in syslog, but seems unrelated to the KDL since other KDL tickets listed above don't seem to have BFS errors. Anyway here they are:
bfs: InitCheck:325: Bad data bfs: inode at 2099031 is already deleted! bfs: GetNextMatching:615: Bad data bfs: could not get inode 2099031 in index "size"! bfs: InitCheck:325: Bad data bfs: inode at 2099032 is already deleted! bfs: GetNextMatching:615: Bad data bfs: could not get inode 2099032 in index "size"!
This one is hidden among unrelated logs, spotted it by chance:
2020-05-28 09:43:44 KERN: bfs: volume reports 49876469 used blocks, correct is 49509747
(typical Haiku syslogging : important information written in all-lowercase, hidden in a ocean of noisy informational reporting *g*)
comment:3 by , 4 years ago
Yes, #15912 and its fix may be related here, if the file cache was indeed overwriting pages (or disk structures). It seems unlikely ... but worth a try.
Those messages look like disk corruption. Running a "checkfs" should resolve them (but beware this may unexpectedly delete a lot of files on 32-bit and more rarely on 64-bit, take backups first!)
comment:4 by , 4 years ago
#16175 may also be related here, as it was a random kernel memory corruption that was also just fixed.
comment:5 by , 4 years ago
Touché. Mentionned that to Dane earlier this morning, awaiting answer. The station in question happens to be running an hrev from late Feb (the bug was introduced in early February) indeed. We'll try to revert the station to a late-january style hrev (can't upgrade him to beta2 candidate, since that suffers from a media memory leak regression as discussed in the other ticket).
comment:6 by , 4 years ago
If you are referring to #16031, did you confirm this occurs outside of MediaPlayer?
comment:7 by , 4 years ago
Reverting that station to hrev53855 to dodge the BQuery bullet, since that one reportedly is part of their stability problems. We'll see if this closes this ticket or at least helps with it. However it would be very nice to update to beta2 so when time permits I'll try to gather more data, run leak_analyser.sh on the now traditional ten-liner BMediaFile test.
comment:8 by , 4 years ago
They've got a second KDL this morning, turns out the hrev "downgrade" did not work, from what I gather the wrong hpkg's were written to the wrong location, hmmm. We're going to look at an easier way to get them out of this pinch, most likely updating them to beta2 or to the latest nightly. We'll take our chances with the media leak, it can't be as bad as a KDL :-)
(for the curious: attached the KDL above, this one is in mutex_lock())
comment:9 by , 4 years ago
It appears you have a "random" driver lying around inside /boot/home. Why and how is that? Did you put the .hpkg's there? We changed mutex ABI recently, so, that would explain the KDL; easy fix: just remove the system packages from ~/config/packages.
When downgrading, you can just run "pkgman install haiku...hpkg" and it will prompt you to downgrade (and of course you can include other packages on that command line.)
comment:10 by , 4 years ago
Good catch, indeed inspecting the machine on remote I see they've somehow installed a recent nightly "haiku....hpkg" file into ~/config/packages (!) along with a dozen others. Transmitting the above info to help them straighten things out..... My preference would be to format/initialize anyway, in order to also get rid of the BFS corruptions in one fell swoop. Hopefully this saga will reach an end soon...
by , 4 years ago
Attachment: | panic-freeing-unknown-block.jpeg added |
---|
"panic freeing unknown block.. from area.." (backtrace mentions _kern_read_port(), block_free()..)
comment:11 by , 4 years ago
That station got this today. System seems to be clean now, both the syslog contents and the hierarchy of the home folder, and the new backtrace seems unrelated to the one that started this ticket, so not sure what to make of it... Maybe we should try again to revert him to an older hrev
comment:13 by , 3 months ago
Resolution: | → no change required |
---|---|
Status: | new → closed |
We haven't tried to resume sales or testing on the Be/Hai side of things in years, so no. We might take a look at R1/beta5 when it's out. Closing.
The KDL got saved in "previous_syslog".. First time I ever see that happen. The bt mentions
file_cache_read
andfree_cached_page
. The negative thread number (aka 0xf7ffffff: in hex?) looks incorrect, like in similar tickets of this type.