Opened 11 years ago

Closed 11 years ago

#1956 closed bug (fixed)

Garbage In Files

Reported by: bonefish Owned by: axeld
Priority: critical Milestone: R1/alpha1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev24528, VMware

This is somewhat vague... While playing with various build tools, several times some file would suddenly contain garbage. E.g. some large text file would contain one or more chunks of binary data. I believe all of the affected files were created or edited during the session, but were OK in the meantime.

I'll try to add some more info when I encounter the issue the next time.

Attachments (4)

haiku.h (20.5 KB) - added by bonefish 11 years ago.
correct file
haiku.h.zip (798 bytes) - added by bonefish 11 years ago.
zipped file containing garbage
config.guess (43.9 KB) - added by bonefish 11 years ago.
correct file (2)
config.guess.zip (13.4 KB) - added by bonefish 11 years ago.
zipped file containing garbage (2)

Download all attachments as: .zip

Change History (16)

comment:1 Changed 11 years ago by axeld

Did you ever run low on memory during working? Were these newly created files? Because if you didn't, the files would be kept in memory all the time, and therefore, something would actively clobber it.

If they were new files and memory was low, eventually something marked the pages as clean, so they would never be written back to disk. Does that garbage look like something that might have been on disk previously?

comment:2 in reply to:  1 Changed 11 years ago by bonefish

Replying to axeld:

Did you ever run low on memory during working?

Usually after a while of working the ProcessController memory bar is full and red. So I suppose it could very well be that in all instances this was the case too.

Were these newly created files?

The recent occurrences definitely were. I recall several times that a Makefile created by the binutils/gcc build system would suddenly contain binary data. I had a case where a header file from the texinfo sources was broken. I didn't edit it before, but I think I unzipped the sources earlier in the same session, so that file was newly created in the session, too.

I defintely had one occurrence when a perl source file that was not new was garbled. This was before your recent BFS journal related fixes, though, and I may have had to terminate an earlier session without syncing. So this might not be related.

Because if you didn't, the files would be kept in memory all the time, and therefore, something would actively clobber it.

If they were new files and memory was low, eventually something marked the pages as clean, so they would never be written back to disk. Does that garbage look like something that might have been on disk previously?

I didn't examine that any closer, also not whether the garbage was page size. Will do the next time I encounter the problem.

I guess a small test suite straining FS/file cache/VM would be a good idea. There still seem to be a few issues with those and when encountering those in the "wild" one usually is doing something else and is not really motivated to examine the problem.

Changed 11 years ago by bonefish

Attachment: haiku.h added

correct file

Changed 11 years ago by bonefish

Attachment: haiku.h.zip added

zipped file containing garbage

comment:3 Changed 11 years ago by bonefish

I just had a case of garbage file, though this one has different circumstances. I edited the already existing file "haiku.h" (a single line only, increasing the file size by 11 bytes) and did a regular "shutdown -r". There was still plenty of memory available. When trying to build gcc in the next session, the file contained only garbage (cf. haiku.h.zip). Again still plenty of memory available.

Don't know, if that means anything, but the syslog of the new session says:

bfs: Insert:1306: Name in use
bfs: Insert:1306: Name in use
bfs: Insert:1306: Name in use
Last message repeated 4 times.

Changed 11 years ago by bonefish

Attachment: config.guess added

correct file (2)

Changed 11 years ago by bonefish

Attachment: config.guess.zip added

zipped file containing garbage (2)

comment:4 Changed 11 years ago by bonefish

Another instance of the problem. This time more like the ones I've seen most of the time. Attached is the file config.guess how it should look like and its garbled version. The garbage starts at offset 1024 and ends at offset 3 * 1024, i.e. it has the length of two FS blocks and only of half a page. So this looks more like a BFS than a file cache/VM problem.

In the Haiku session I removed the original config.guess which predated the session and copied another file to the same location. After some configuring and making I hit the problem. I can't tell whether the file was broken right after being copied or got damaged later.

comment:5 Changed 11 years ago by axeld

I hope you haven't deleted that file yet? It might be helpful to add a dump of its inode with it (ie. how the data stream is laid out). The super block might be helpful, too.

I have the feeling that it might be the same reason as #1914: something freed the blocks that were owned by that file.

Did you have BFS tracing enabled? Can you find out when and how those blocks were freed by any chance??

comment:6 in reply to:  5 Changed 11 years ago by bonefish

Replying to axeld:

I hope you haven't deleted that file yet?

Of course I did. I needed to get something done. :-)

It might be helpful to add a dump of its inode with it (ie. how the data stream is laid out). The super block might be helpful, too.

I have the feeling that it might be the same reason as #1914: something freed the blocks that were owned by that file.

Did you have BFS tracing enabled? Can you find out when and how those blocks were freed by any chance??

Nope. I suppose I'll better enable it now. Not sure, if it helps, though. Those build processes usually run a lot of stuff, so that even 200 MB tracing buffer don't last that long.

comment:7 Changed 11 years ago by axeld

Should be fixed with hrev24607. If it doesn't happen for you anymore, please close this ticket.

comment:8 in reply to:  7 Changed 11 years ago by bonefish

Replying to axeld:

Should be fixed with hrev24607. If it doesn't happen for you anymore, please close this ticket.

I hit the problem again with hrev24614. I suppose I have to reinitialize partitions that I used with an earlier BFS version, right?

comment:9 Changed 11 years ago by axeld

You better should - unless you can run chkbfs on that partition and it doesn't notice a block is used twice. BTW how would I go about implement chkbfs on Haiku using the device API?

comment:10 in reply to:  9 Changed 11 years ago by bonefish

Replying to axeld:

BTW how would I go about implement chkbfs on Haiku using the device API?

The most important part is obviously implementing the FS repair() hook. :-) I don't know, if everything from BPartition::Repair(), over the BFS userland add-on (I think that isn't implemented yet), over the syscall is wired correctly. The syscall definitely needs a review, and I also think we should consider making BPartition::Repair() immediate, just like Mount(). I don't think one ever wants to edit a few partitions and also schedule a repair on one at the same time (probably the same for Defragment()).

I believe generally missing is a back channel from the repair() hook to the userland app. At the moment it can't be interactive ("Inode 19412 is toast. Remove it? [y/n]") at all. That's not so easy to implement in a generic way, though. The userland add-on would need to wrap all the communication with the kernel add-on and probably provide a GUI and a non-GUI alternative. For the communication itself a port could be used (obviously two threads are needed), which would be passed via a driver-settings-style parameters parameter.

BPartition::Repair() could then work like this: It takes an additional BDiskDeviceOperationCallback (or, if we need it specialized: BPartitionRepairCallback (*Callback might not fit that well)), which is the interface for the userland add-on to interact with the application. I.e. it could pass a GUI to it, and call hooks for events that require interaction. Repair() would first check with the userland add-on, if repairing is generally OK and ask for additional parameters (driver-settings) to pass to the kernel. It would then spawn a thread and send it into the userland add-on, where it will do the communication with the kernel add-on and the callback object from the application until the repair operation is done. Afterwards Repair() invokes the syscall.

Anyway, this is all pretty off-topic for this ticket. :-)

comment:11 Changed 11 years ago by stippi

Should be fixed with hrev24607. If it doesn't happen for you anymore, please close this ticket.

I hit the problem again with hrev24614. I suppose I have to reinitialize partitions that I used with an earlier BFS version, right?

You better should

Can this be closed then? :-)

comment:12 Changed 11 years ago by stippi

Resolution: fixed
Status: newclosed

Talked to Ingo on the phone, this one has just been forgotten... fixed in hrev24607.

Note: See TracTickets for help on using tickets.