Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#9289 closed bug (invalid)

Corrupted partition on dual boot system

Reported by: andret Owned by: korli
Priority: high Milestone: R1
Component: File Systems/ext2 Version: R1/alpha4.1
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Installed alpha 4.1 on a machine with Grub2 and Linux already installed.

The install process was fine and I could log on to both OS's. However, after a Haiku crash I was unable to access the Linux partition, which seems to be corrupt (or the MBR?)

Attachments (1)

syslog.zip (326.6 KB ) - added by andret 11 years ago.
syslog of borked system

Download all attachments as: .zip

Change History (27)

comment:1 by pulkomandy, 11 years ago

Is it a plain MBR, or GPT ? What's the hard drive size ?

in reply to:  1 comment:2 by andret, 11 years ago

Replying to pulkomandy:

Is it a plain MBR, or GPT ?

Not sure, how to tell?

What's the hard drive size ?

1TB

comment:3 by korli, 11 years ago

Did you mount the Linux partition from Haiku before the crash?

comment:4 by diver, 11 years ago

Component: File SystemsFile Systems/ext2
Owner: changed from nobody to korli

Had similar ext4 curruption recently. After mounting it in Haiku - Ubuntu stopped booting with fs errors. Tried to fix it using fsck and even if it seemed to fix booting, ext4 went to read-only mode eventually (triggered by updatedb in Ubuntu) and needed to be fixed again using live usb and fsck. Had to reformat to fix it completely :-(

comment:5 by Randell, 11 years ago

I'm seeing this problem without any sort of mounting.

Unable to reproduce on releases previous to alpha4.1, so it's probably introduced recently.

Seagate 1tb disk, 12gb mem, had two other partitions (one linux, one freebsd, both now gone)

I seem to get this when putting Haiku under load (using lots of cpu, io and memory), but happened once while idling (seemingly at least.)

Since I can reproduce fairly often, please let me know how to best debug this, at least how I can provide more information.

comment:6 by pulkomandy, 11 years ago

What do you mean by partitions gone ? Is the MBR destroyed ? The superblock on these filesystems ? some other data ?

If you can reproduce it that easily, I guess one way to more clearly see what happened would be to clone your hard disk, have Haiku corrupt one of the clones, and compare with the other, to see where the corruption hapenned and what got written there.

If there's a visible crash of Haiku when the corruption happens (KDL or an application crashing), a backtrace would be useful. Type "bt" and take a picture of the screen.

comment:7 by Randell, 11 years ago

Excellent idea cloning, crashing and comparing. I'll look into it straight away.

Not sure about the other people, but I for one can't type 'bt' after the crash because Haiku triple faults into a reboot (from the looks of it)

Last edited 11 years ago by Randell (previous) (diff)

comment:8 by Randell, 11 years ago

That was more work than expected, since there was a lot of legit diffs between the clone and the crashed instance. But basically I see blocks of 4K with all zero's in both the MBR and also in the actual partitions. I don't really see a pattern of the placement of these 4K blocks.

I would love to have others look into this data, but it's a 1tb disk so I'm not sure how to do it practically. Ideas?

comment:9 by Randell, 11 years ago

Something is very fishy with the ata driver or something, maybe lba computations? With 100, 320 and 750 gb disk, I'm unable to reproduce. Anything larger and the corruption kicks in very quickly. Maybe the ata code maintainer should look into it.

in reply to:  3 comment:10 by andret, 11 years ago

Replying to korli:

Did you mount the Linux partition from Haiku before the crash?

No, I have never mounted to or from Haiku, alas.

comment:11 by korli, 11 years ago

Could you try with some nightly builds between releases on http://haiku-files.org/unsupported-builds/x86-gcc2/ ? There about 2500 revisions between alpha 3 and alpha4. It could help to find out starting at hrev44026 for instance.

You're on real hardware and smp, right?

in reply to:  11 comment:12 by Randell, 11 years ago

Replying to korli:

Could you try with some nightly builds between releases on http://haiku-files.org/unsupported-builds/x86-gcc2/ ? There about 2500 revisions between alpha 3 and alpha4. It could help to find out starting at hrev44026 for instance.

You're on real hardware and smp, right?

Yes I am, 4-way AMD and i7, respectively. I have verified the problem on two real machines.

I mentioned earlier I couldn't get corruption on smaller disks, but it seems only the probability drops, as I now also got hit on the 750gb sata.

I can try to bisect, but it's extremely time consuming. I did try the first, middle and last revision between alpha 3 and 4, and the middle and last revision caused corruption. So as far as I can tell, the bug was introduced sometimes after the middle rev (or I'm just not being lucky reproducing the bug at the earlier revs.)

comment:13 by axeld, 11 years ago

Thanks for going through all this, Randell! Can you please give the exact revisions you tested with, and whether they showed the problem? This is an extremely critical bug, that hasn't been set yet anywhere.

You are using the "ata" driver on both systems, not "ahci", right? If you're not sure, the device path to the volumes would help finding out.

in reply to:  13 comment:14 by Randell, 11 years ago

Replying to axeld:

Thanks for going through all this, Randell! Can you please give the exact revisions you tested with, and whether they showed the problem?

bisect so far indicates a commit on 2012-09-14 being to blame, probably 3c5216179e02f3b711bbac9c021f821ad3628cc7 (hrev44639) but I need to verify this some more once I get time.

comment:15 by umccullough, 11 years ago

I think it would be pretty helpful to have a clean boot syslog from the machine that causes problems, as well as the listdev output.

That way we can identify specific hardware combinations that could be problematic.

Can find info here on how to obtain them: http://dev.haiku-os.org/wiki/ReportingBugs

comment:16 by andret, 11 years ago

It's hrev44639 that introduces it for me too. Switching from ahci to ata might fix it?

Last edited 11 years ago by andret (previous) (diff)

comment:17 by axeld, 11 years ago

Can you please provide a syslog from a boot in Haiku? You can find it under /var/log/syslog.

by andret, 11 years ago

Attachment: syslog.zip added

syslog of borked system

comment:18 by andret, 11 years ago

axeld, syslog attached

comment:19 by umccullough, 11 years ago

Please upload an uncompressed syslog, this one doesn't seem to be a valid zipfile.

Last edited 11 years ago by umccullough (previous) (diff)

in reply to:  19 ; comment:20 by andret, 11 years ago

Replying to umccullough:

Please upload an uncompressed syslog, this one doesn't seem to be a valid zipfile.

It was to big to upload uncompressed. it's 7zip ultra, winx should recognize it too

in reply to:  20 ; comment:21 by umccullough, 11 years ago

Replying to andret:

It was to big to upload uncompressed. it's 7zip ultra, winx should recognize it too

I seriously doubt a clean boot syslog was > 5mb - and I did use 7zip on windows to attempt to open it.

Plenty of others have attached syslogs and listdev output without issue.

in reply to:  21 comment:22 by andret, 11 years ago

Replying to umccullough:

Replying to andret:

It was to big to upload uncompressed. it's 7zip ultra, winx should recognize it too

I seriously doubt a clean boot syslog was > 5mb - and I did use 7zip on windows to attempt to open it.

Plenty of others have attached syslogs and listdev output without issue.

I'm sorry to hear about your doubts. It wasn't a clean boot syslog, is that's what you want?

comment:23 by JamesVanZel, 11 years ago

Hi. I was about to report a similar issue, but found this one.

A difference is that I am not using a dual boot system, it's only Haiku installed. Like other reports, I do have a large hard drive and a lot of memory, which seems to be a bad combination?

I had no problems installing Haiku and it ran fine for a while. After a rebooting crash, the OS was no longer found by the BIOS.

I'm unfortunately technical enough to try out older "revisions", I was merely running the latest officially released alpha version.

My disk is a 2TB seagate barracuda.

This is just a test machine so it was no problem losing data, and after reinstalling Haiku I had no problem so far. I will comment again if I see a pattern.

Version 0, edited 11 years ago by JamesVanZel (next)

comment:24 by andret, 11 years ago

umcullough on irc suggested we all peruses the logs and jerks ourselves off. Can someone elaborate?

comment:25 by axeld, 11 years ago

Resolution: invalid
Status: newclosed

I guess we played long enough here. Thanks for the hoax, although I'm sure you could do better :-)

in reply to:  25 comment:26 by andret, 11 years ago

Replying to axeld:

I guess we played long enough here. Thanks for the hoax, although I'm sure you could do better :-)

I'll try better next month. Thanks for the laughs.

Note: See TracTickets for help on using tickets.