Opened 11 months ago

Last modified 4 weeks ago

#18454 new bug

ASSERT FAILED: address.minimum + address_length - 1 == address.maximum

Reported by: Mas4hmad Owned by: nobody
Priority: blocker Milestone: R1/beta5
Component: Drivers Version: R1/Development
Keywords: boot-failure Cc: PulkoMandy, X512
Blocked By: Blocking: #18536, #18751, #18764, #18856
Platform: All

Description (last modified by Mas4hmad)

as suggested from korli, i create new ticked similar into #17369.

I have downloaded Haiku nightly image and tried to boot my device. but stuck at processing 'atom' image in screen (without debug yet).

When i open debug screen. it say PackageVolumeInfo::_InitStates(): failed to parse activated-packages: No such file or directory in mid-log and PackageVolumeInfo::LoadOldStates(): failed to open administrative directory: No such file or directory in end-log.

There are another debug log shows in KDL. It says PANIC: ASSERT FAILED and PANIC: did not find any boot partition.

there is device info:

Machine: Lenovo ideapad flex 10

CPU: intel baytrail 64 bit. (UEFI compatible)

RAM: 2gb

ROM: 240gb SSD

boot media: haiku OS nightly anyboot hrev 57085 x86 gcc2 (32bit)

Attachments (28)

page1.jpg (2.1 MB ) - added by Mas4hmad 11 months ago.
page2.jpg (2.0 MB ) - added by Mas4hmad 11 months ago.
page3.jpg (2.4 MB ) - added by Mas4hmad 11 months ago.
page4.jpg (2.1 MB ) - added by Mas4hmad 11 months ago.
page5.jpg (2.3 MB ) - added by Mas4hmad 11 months ago.
page6.jpg (2.0 MB ) - added by Mas4hmad 11 months ago.
page7.jpg (2.3 MB ) - added by Mas4hmad 11 months ago.
page8.jpg (2.3 MB ) - added by Mas4hmad 11 months ago.
0_kdl.jpg (3.0 MB ) - added by Mas4hmad 11 months ago.
1_ints.jpg (3.1 MB ) - added by Mas4hmad 11 months ago.
2_co.jpg (2.4 MB ) - added by Mas4hmad 11 months ago.
3_bt.jpg (3.5 MB ) - added by Mas4hmad 11 months ago.
4_ints.jpg (2.5 MB ) - added by Mas4hmad 11 months ago.
IMG_20230619_005543.jpg (1.2 MB ) - added by Mas4hmad 11 months ago.
Without ACPI
IMG_20230619_012202.jpg (2.0 MB ) - added by Mas4hmad 11 months ago.
Without acpi
IMG_20230619_012512.jpg (1.8 MB ) - added by Mas4hmad 11 months ago.
With acpi +ints
syslog (437.3 KB ) - added by Mas4hmad 11 months ago.
sys log according issue
syslog.2 (290.7 KB ) - added by Mas4hmad 6 months ago.
syslog about now
photo1700220898.jpeg (420.4 KB ) - added by jackburton 6 months ago.
syslog with PCI ranges from ACPI
photo1700737545.jpeg (291.9 KB ) - added by jackburton 5 months ago.
Linux PCI ranges from dmesg
linux_dmesgpci (5.0 KB ) - added by Mas4hmad 5 months ago.
acpidump_lenoflex10.zip (68.2 KB ) - added by Mas4hmad 5 months ago.
syslog.3 (140.2 KB ) - added by Mas4hmad 5 months ago.
the log after grabbing latest iso nightly build from pulkomandy
syslog.4 (155.2 KB ) - added by Mas4hmad 5 months ago.
syslog.5 (110.1 KB ) - added by Mas4hmad 5 months ago.
syslog after rebasing from https://review.haiku-os.org/c/haiku/+/7111
syslog.6 (113.4 KB ) - added by Mas4hmad 5 months ago.
syslog after rebasing from https://review.haiku-os.org/c/haiku/+/7111
syslog.7 (162.2 KB ) - added by Mas4hmad 5 months ago.
syslog after rebasing from https://review.haiku-os.org/c/haiku/+/7111, obtained from lenovo thinkpad x1 carbon 2014 for comparison.
syslog.8 (106.6 KB ) - added by Mas4hmad 4 months ago.
hrev57514. for comparison

Change History (79)

comment:1 by Starcrasher, 11 months ago

I fear that those end lines are only consequences of what happens before and won't help much to understand the problem. Try to play with boot options to get a complete syslog. One way to do that is to activate 'Enable on screen debug output' and take pictures of each page on screen. Then add the pics as attachments.

comment:2 by Mas4hmad, 11 months ago

if i use checklist 'enable on screen debug output' and 'continue booting', it'll show normal boot and keep stuck before 'atom' logo is colored and i can't do any further.

so i use 'display recent debug log' instead to grab the log.

Last edited 11 months ago by Mas4hmad (previous) (diff)

by Mas4hmad, 11 months ago

Attachment: page1.jpg added

by Mas4hmad, 11 months ago

Attachment: page2.jpg added

by Mas4hmad, 11 months ago

Attachment: page3.jpg added

by Mas4hmad, 11 months ago

Attachment: page4.jpg added

by Mas4hmad, 11 months ago

Attachment: page5.jpg added

by Mas4hmad, 11 months ago

Attachment: page6.jpg added

by Mas4hmad, 11 months ago

Attachment: page7.jpg added

by Mas4hmad, 11 months ago

Attachment: page8.jpg added

comment:3 by waddlesplash, 11 months ago

Those error messages are harmless and not a problem here.

Please try booting with "enable on-screen debug output" and "disable on-screen paging." Then take a picture when the logs stop (or seem to stall.)

comment:4 by Mas4hmad, 11 months ago

It seems my device forced to use 'graphical boot' and somehow can't show screen debug.

Even I had follow the instructions above and got only graphical boot screen without any indicator of progress.

Will try when have any serial cable.

Maybe that is the only choice to get a useful debug log (i think).

by Mas4hmad, 11 months ago

Attachment: 0_kdl.jpg added

by Mas4hmad, 11 months ago

Attachment: 1_ints.jpg added

by Mas4hmad, 11 months ago

Attachment: 2_co.jpg added

by Mas4hmad, 11 months ago

Attachment: 3_bt.jpg added

by Mas4hmad, 11 months ago

Attachment: 4_ints.jpg added

comment:5 by Mas4hmad, 11 months ago

Somehow the screen show KDL after half(-ish) minutes (in 0_kdl.jpg) and i try as can as be possible. It seems "assert failed"

After that, there are another KDL shows 'panic' (in 2_co.jpg)

comment:6 by Mas4hmad, 11 months ago

Description: modified (diff)

comment:7 by waddlesplash, 11 months ago

Component: - GeneralDrivers
Keywords: boot-failure added
Platform: x86All

The first panic is strange, it comes from the ACPI probing logic for PCI. You may want to try booting with ACPI disabled in bootloader options.

The second panic is a more standard one and could have any number of cases, it may be related to the first one though.

by Mas4hmad, 11 months ago

Attachment: IMG_20230619_005543.jpg added

Without ACPI

comment:8 by Mas4hmad, 11 months ago

Debug log i gathered is from booting up device with ventoy (which is known can boot though didn't found any media partition from other but Linux and windows os).

Will try boot with Rufus flasher (dd image) and look further

by Mas4hmad, 11 months ago

Attachment: IMG_20230619_012202.jpg added

Without acpi

comment:9 by waddlesplash, 11 months ago

I think it's known that Haiku does not boot correctly with Ventoy. I see you have now booted successfully; can you replicate that success also with ACPI?

by Mas4hmad, 11 months ago

Attachment: IMG_20230619_012512.jpg added

With acpi +ints

in reply to:  9 comment:10 by Mas4hmad, 11 months ago

Replying to waddlesplash:

Can you replicate that success also with ACPI?

Yes. It could boot (with assert) And (maybe) that's will be different issue.

I thought that haiku will be stuck at booting for a while until i realize that several Linux distro (e.g. alpine) behave same at x86.

Alpine Linux take 20-ish minute at boot at x86.

comment:11 by waddlesplash, 11 months ago

Cc: PulkoMandy X512 added
Summary: Nightly Haiku OS boot stuckASSERT FAILED: address.minimum + address_length - 1 == address.maximum

The assertion is a real issue. PulkoMandy was the one who edited this code and added some asserts, let's see if he or X512 has any ideas.

by Mas4hmad, 11 months ago

Attachment: syslog added

sys log according issue

comment:12 by korli, 11 months ago

Apparently when minAddress_fixed or max Address_fixed is equals to ACPI_ADDRESS_FIXED, then the assert doesn't count, see here

comment:13 by pulkomandy, 11 months ago

The commit that added the assert: https://cgit.haiku-os.org/haiku/commit/src/add-ons/kernel?id=8be0a59e7780e1fc30d405702f3c071944cd7db9

I see FreeBSD also checks for length = 0, but we need to add these two other checks.

Basically, if we have 2 of the 3 variables (min, max, length), we can compute the third:

max = min + length
length = max - min
min = max - length

comment:14 by jackburton, 8 months ago

Same error on a HP ProBook 4510s. I'm able to boot (althought sometimes it hangs) disabiling ACPI.

comment:15 by waddlesplash, 8 months ago

Milestone: UnscheduledR1/beta5

comment:16 by pulkomandy, 6 months ago

Can you test with https://review.haiku-os.org/c/haiku/+/7111 and provide an updated syslog from that version?

in reply to:  16 comment:17 by jackburton, 6 months ago

Replying to pulkomandy:

Can you test with https://review.haiku-os.org/c/haiku/+/7111 and provide an updated syslog from that version?

Unfortunately I can't build a full haiku image at the moment

comment:18 by Starcrasher, 6 months ago

You don't need to build it yourself, you can find test build isos here : https://haiku.movingborders.es/testbuild/

by Mas4hmad, 6 months ago

Attachment: syslog.2 added

syslog about now

comment:19 by jackburton, 6 months ago

I tried this one:

https://haiku.movingborders.es/testbuild/release/master/hrev57382

but for some reason I can't see the output I'm supposed to see: the syslog ends here:

"initialize PCI controller from ACPI"

and then kernel debugger.

N.B: I look at the syslog from inside the kernel debugger.

comment:20 by pulkomandy, 6 months ago

You need toget the build from the sp crfic changeset, not from master, since it is not merged: https://haiku.movingborders.es/testbuild/I23d87da32779d22324f944b5b359390f523ec7a7/1/hrev57382/

This link is provided in the changeset comments by the build review so you can find it easily for each change

by jackburton, 6 months ago

Attachment: photo1700220898.jpeg added

syslog with PCI ranges from ACPI

in reply to:  20 comment:21 by jackburton, 6 months ago

Replying to pulkomandy:

You need toget the build from the sp crfic changeset, not from master, since it is not merged: https://haiku.movingborders.es/testbuild/I23d87da32779d22324f944b5b359390f523ec7a7/1/hrev57382/

Thanks. Done!

comment:22 by Mas4hmad, 6 months ago

is this PCI range need to be compare with other system e.g. linux?

if so, i can help to obtain from linux one and upload here to see comparison with haiku

comment:23 by pulkomandy, 5 months ago

So, the last range in that log is:

fff00000 to fbffffff

This doesn't make sense, the minimum is greater than the maximum. Also that would be a length of 0x3ff0001, but that is not at all what we get, instead we have fc100001.

I think at this point, all we can do is turn this ASSERT into an error log and ignore the range, since nothing about this range makes sense.

comment:24 by waddlesplash, 5 months ago

That's the last range in the list. Is it possible that somehow we read past the end of an array incorrectly?

comment:25 by pulkomandy, 5 months ago

It's the last printed one because we hit the assert and the enumeration stops, I think, not because it is the last in the list. But it's possible that we're not making enough checks on what type of entry it is (we just check IO vs memory but maybe there are other things to look for)

Also, we don't directly "loop" over the resources ourselves, we just call ACPI "walk_resources" on the _CRS table. I guess a dump of that table would be useful to see how it looks? Using a tool like acpidump (on Linux) or something like it. So it seems unlikely that the end of the table detection is a problem.

by jackburton, 5 months ago

Attachment: photo1700737545.jpeg added

Linux PCI ranges from dmesg

comment:26 by jackburton, 5 months ago

Uploaded pictures of pci ranges from linux

comment:27 by waddlesplash, 5 months ago

The Linux PCI ranges will be already processed from the ACPI ranges (you can see that all the adjoining ranges have been merged already.) So can you try to get a dump of the ACPI table directly?

comment:28 by jackburton, 5 months ago

Will try. At the moment I only have access to a system rescue CD live distro which doesn't have acpidump. Will try to get a full distro into this PC.

comment:29 by jackburton, 5 months ago

Oh great: the laptop display just died, so I can't try anything.

in reply to:  29 comment:30 by korli, 5 months ago

Replying to jackburton:

Oh great: the laptop display just died, so I can't try anything.

Do you have an external display at hand?

comment:31 by jackburton, 5 months ago

Just tried: it seems there's something else, since it doesn't work with external display, either. Maybe Mas4hmad could give this info, since he seems to have a similar laptop ?

by Mas4hmad, 5 months ago

Attachment: linux_dmesgpci added

by Mas4hmad, 5 months ago

Attachment: acpidump_lenoflex10.zip added

comment:32 by Mas4hmad, 5 months ago

i don't have unique case like minimum address is greater from maximum address. from jackburton he have much zeroes more PCI Range than me and have negative range.

likely must have look data table from linux like waddlesplash said. so we can see comparison of data.

will try dump acpi with haiku now (until it boot to desktop successfully. need quarter to hour to do that T.T), but i'm afraid i cant AFAIR there are issue in https://github.com/haikuports/haikuports/issues/4005 doesn't closed yet

Last edited 5 months ago by Mas4hmad (previous) (diff)

comment:33 by pulkomandy, 5 months ago

I don't need the ACPI dump to be done from Haiku, it would be the same as from Linux.

However it would be useful if you can test the Haiku build from here: https://haiku.movingborders.es/testbuild/I23d87da32779d22324f944b5b359390f523ec7a7/2/hrev57390/ and take a picture of the end of the syslog after it hits the assert (I know you already attached pictures, but this one has extra logs)

comment:34 by pulkomandy, 5 months ago

Actually, I see you already posted that (syslog.2).

So, here is what we get in Haiku:

401	KERN: PCI: range from ACPI [0(1),6f(1)] with length 70
402	KERN: PCI: range from ACPI [78(1),cf7(1)] with length c80
403	KERN: PCI: range from ACPI [d00(1),ffff(1)] with length f300
404	KERN: PCI: range from ACPI [a0000(1),bffff(1)] with length 20000
405	KERN: PCI: range from ACPI [c0000(1),dffff(1)] with length 20000
406	KERN: PCI: range from ACPI [e0000(1),fffff(1)] with length 20000
407	KERN: PCI: range from ACPI [0(1),0(1)] with length 0
408	KERN: PCI: range from ACPI [ffffffff(1),fffffffe(1)] with length 0
409	KERN: PCI: range from ACPI [80000000(1),907ffffe(1)] with length 10800000

This is what I see in dsdt,dsl in the acpidump:

                WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
                    0x0000,             // Granularity
                    0x0000,             // Range Minimum
                    0x006F,             // Range Maximum
                    0x0000,             // Translation Offset
                    0x0070,             // Length
                    ,, , TypeStatic, DenseTranslation)
                WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
                    0x0000,             // Granularity
                    0x0078,             // Range Minimum
                    0x0CF7,             // Range Maximum
                    0x0000,             // Translation Offset
                    0x0C80,             // Length
                    ,, , TypeStatic, DenseTranslation)
                WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
                    0x0000,             // Granularity
                    0x0D00,             // Range Minimum
                    0xFFFF,             // Range Maximum
                    0x0000,             // Translation Offset
                    0xF300,             // Length
                    ,, , TypeStatic, DenseTranslation)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x000A0000,         // Range Minimum
                    0x000BFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x00020000,         // Length
                    ,, , AddressRangeMemory, TypeStatic)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x000C0000,         // Range Minimum
                    0x000DFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x00020000,         // Length
                    ,, , AddressRangeMemory, TypeStatic)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x000E0000,         // Range Minimum
                    0x000FFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x00020000,         // Length
                    ,, , AddressRangeMemory, TypeStatic)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x7A000000,         // Range Minimum
                    0x7A3FFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x00400000,         // Length
                    ,, _Y00, AddressRangeMemory, TypeStatic)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x7C000000,         // Range Minimum
                    0x7FFFFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x04000000,         // Length
                    ,, _Y02, AddressRangeMemory, TypeStatic)
                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
                    0x00000000,         // Granularity
                    0x80000000,         // Range Minimum
                    0xDFFFFFFF,         // Range Maximum
                    0x00000000,         // Translation Offset
                    0x60000000,         // Length
                    ,, _Y01, AddressRangeMemory, TypeStatic)

The first 6 entries match up, but then it's all wrong. It seems that maybe a buffer is only large enough for 8 entries, and when the table is too big, things just get confused?

comment:36 by waddlesplash, 5 months ago

Hmm, that's just for storing ranges though, not reading them, and the way it's accessed it looks like if there's more than 6 entries we will just overwrite one or more of them?

by Mas4hmad, 5 months ago

Attachment: syslog.3 added

the log after grabbing latest iso nightly build from pulkomandy

by Mas4hmad, 5 months ago

Attachment: syslog.4 added

by Mas4hmad, 5 months ago

Attachment: syslog.5 added

comment:37 by pulkomandy, 5 months ago

In the ACPI dump, I see that the non-working ranges have _Y00, _Y01 and _Y02 in their flags. What are these? Could they prevent decoding with our code?

by Mas4hmad, 5 months ago

Attachment: syslog.6 added

by Mas4hmad, 5 months ago

Attachment: syslog.7 added

syslog after rebasing from https://review.haiku-os.org/c/haiku/+/7111, obtained from lenovo thinkpad x1 carbon 2014 for comparison.

in reply to:  37 comment:38 by Mas4hmad, 5 months ago

as far as i understand, it would be 14th parameter from PCI Resource table (ideapad flex got RES0 and thinkpad x1 carbon got _CRS)

Replying to pulkomandy:

In the ACPI dump, I see that the non-working ranges have _Y00, _Y01 and _Y02 in their flags. What are these? Could they prevent decoding with our code?

yes, it could be... from syslog.7 (with lenovo thinkpad x1 carbon) we can see from line 456 that i have _Y17 parameter cause have length 0, in line 457 i got _Y18 have same length until line 467 i got _Y22 parameter.

KERN: PCI: range from ACPI [0(1),cf7(1)] with length cf8
454	KERN: PCI: range from ACPI [d00(1),ffff(1)] with length f300
455	KERN: PCI: range from ACPI [a0000(1),bffff(1)] with length 20000
456	KERN: PCI: range from ACPI [c0000(1),c3fff(1)] with length 0
457	KERN: PCI: range from ACPI [c4000(1),c7fff(1)] with length 0
458	KERN: PCI: range from ACPI [c8000(1),cbfff(1)] with length 0
459	KERN: PCI: range from ACPI [cc000(1),cffff(1)] with length 0
460	KERN: PCI: range from ACPI [d0000(1),d3fff(1)] with length 0
461	KERN: PCI: range from ACPI [d4000(1),d7fff(1)] with length 0
462	KERN: PCI: range from ACPI [d8000(1),dbfff(1)] with length 0
463	KERN: PCI: range from ACPI [dc000(1),dffff(1)] with length 0
464	KERN: PCI: range from ACPI [e0000(1),e3fff(1)] with length 0
465	KERN: PCI: range from ACPI [e4000(1),e7fff(1)] with length 0
466	KERN: PCI: range from ACPI [e8000(1),ebfff(1)] with length 0
467	KERN: PCI: range from ACPI [ec000(1),effff(1)] with length 0
468	KERN: PCI: range from ACPI [fff00000(1),febfffff(1)] with length fed00000
469	KERN: PCI: range from ACPI [fed40000(1),fed4bfff(1)] with length c000

in line 468 i got _Y23 parameter, but it jumped to correct maximum address (with wrong range minimum and length)

but it can be weird for line 456 which i got right pci range and address eventough have _Y24 parameter.

is that pci range with that parameter could/might be treat separately? (as we can see from linux dmesgpci, the pci range with that parameter not seems there like pci range without that parameter. i mean, linux don't generally treat pci range with that parameter unlike other ones).

from thinkpad x1 carbon it's not hit ASSERT like should be like i got from ideapad flex 10

Last edited 5 months ago by Mas4hmad (previous) (diff)

comment:39 by waddlesplash, 4 months ago

Blocking: 18536 added

comment:40 by waddlesplash, 4 months ago

Please retest after hrev57511; this merges that patch, but in the meantime there were some fixes to ACPI and an upgrade of ACPICA so perhaps something might differ.

comment:41 by waddlesplash, 4 months ago

Hmm, there appears to be a "producer/consumer" flag here that FreeBSD checks, possibly ignoring the value depending: https://xref.landonf.org/source/xref/freebsd-current/sys/dev/acpica/acpi_resource.c#387

Do we need to do the same?

comment:42 by waddlesplash, 4 months ago

Ah: https://docs.kernel.org/PCI/acpi-info.html

ACPI defines a Consumer/Producer bit to distinguish the bridge registers ("Consumer") from the bridge apertures ("Producer") [4, 5], but early BIOSes didn't use that bit correctly. The result is that the current ACPI spec defines Consumer/Producer only for the Extended Address Space descriptors

by Mas4hmad, 4 months ago

Attachment: syslog.8 added

hrev57514. for comparison

comment:43 by waddlesplash, 4 months ago

Presumably the range in question:

KERN: PCI: range from ACPI [80000000(1),907ffffe(1)] with length 10800000

The maximum here differs from the minimum+length by 0x2. That seems suspicious?

comment:44 by waddlesplash, 3 months ago

Blocking: 18751 added

comment:45 by waddlesplash, 3 months ago

Blocking: 18764 added

comment:46 by waddlesplash, 3 months ago

Priority: normalblocker

More instances of this have shown up. Increasing priority as it's a boot-failure regression.

in reply to:  43 comment:47 by korli, 3 months ago

Replying to waddlesplash:

The maximum here differs from the minimum+length by 0x2. That seems suspicious?

from https://linux-hardware.org/?probe=ac4be1ce4d&log=dmesg the range looks legit:

[    0.556923] pci_bus 0000:00: root bus resource [mem 0x80000000-0x907ffffe window]

comment:48 by waddlesplash, 7 weeks ago

Right, but the length doesn't seem to match it. So what's going on there?

comment:49 by waddlesplash, 6 weeks ago

Blocking: 18856 added

comment:50 by pulkomandy, 5 weeks ago

Hello,

I have made a new fix, removing the assert and trying to handle anything the ACPI tables may have:

https://review.haiku-os.org/c/haiku/+/7581

Let me know if that works on the different machines.

comment:51 by waddlesplash, 4 weeks ago

Patch merged in hrev57681.

Note: See TracTickets for help on using tickets.