Opened 2 years ago

Last modified 6 months ago

#17377 new bug

radeon_hd crashes in connector_probe()

Reported by: smallstepforman Owned by: kallisti5
Priority: high Milestone: Unscheduled
Component: Drivers/Graphics/radeon_hd Version: R1/beta3
Keywords: Cc:
Blocked By: Blocking: #17452, #17473, #17853, #18080, #18505
Platform: All

Description

The previous working version of Haiku x64 nightly was hrev55399 (2021 Sep 8) The next broken version is Haiku x64 nightly hrev 55614 (2021 Nov 6)

KDL during boot, after rocket icon.

libstdc++.so.6.0.29 unsupported compilation unit version: 5 This may be due to switch to gcc 11.

Error on Ryzen 3700, nvme, MSI Tomahawk x570, 32Gb RAM, video AMD 5600XT However, on MacBook Pro 11.3 (2014), i7 4th gen, nVidia 750M, 16Gb, it boots successfully

Attaching syslog.old (boot from hrev55614, broken) and syslog (boot hrev55399, working)

Attachments (21)

syslog.old (512.0 KB ) - added by smallstepforman 2 years ago.
syslog (279.9 KB ) - added by smallstepforman 2 years ago.
syslog.2 (115.7 KB ) - added by smallstepforman 2 years ago.
USB boot latest from Alex
app_server-1054-debug-12-12-2021-23-07-13.report (23.4 KB ) - added by smallstepforman 2 years ago.
Crash report hrev55720
syslog.3 (263.0 KB ) - added by smallstepforman 2 years ago.
hrev55763 syslog
0001-radeon_hd-Fix-memory-detection-on-newer-cards.patch (2.3 KB ) - added by kallisti5 2 years ago.
fix patch v1
app_server-1050-debug-11-01-2022-08-53-49.report (23.8 KB ) - added by smallstepforman 2 years ago.
app_server crash report
app_server-1050-debug-11-01-2022-08-54-17.core.zip (4.9 MB ) - added by smallstepforman 2 years ago.
app_server core file (zip)
radeon_hd_bios_1002_731f_0.bin (128.0 KB ) - added by smallstepforman 2 years ago.
Radeon BIOS
syslog_06Jan2022 (114.3 KB ) - added by smallstepforman 2 years ago.
Syslog 06Jan2022
radeon_hd_bios_1002_1638_0.bin (54.0 KB ) - added by smallstepforman 2 years ago.
Radeon RX6600M bios
syslog_RX6600M (466.2 KB ) - added by smallstepforman 2 years ago.
Syslog_RX6600M
syslog_11Jan2022 (232.4 KB ) - added by smallstepforman 2 years ago.
Syslog 11 Jan 2022 (5600XT desktop)
syslog_12Jan2022 (344.6 KB ) - added by smallstepforman 2 years ago.
Syslog_12Jan2022 5600XT
syslog.2.old (512.0 KB ) - added by smallstepforman 2 years ago.
Syslog hrev55954 Radeon 6600M
syslog.3.old (512.1 KB ) - added by smallstepforman 2 years ago.
syslog for hrev55954 Radeon 5600XT
syslog.4.old (512.0 KB ) - added by smallstepforman 2 years ago.
Hopefully correct syslog from hrev55954 (Radeon 5600XT)
syslog.5.old (512.0 KB ) - added by smallstepforman 2 years ago.
Second attempt syslog hrev55954 Radeon RX6600M
syslog.4 (431.3 KB ) - added by smallstepforman 18 months ago.
Syslog hrev56536
syslog.5 (161.1 KB ) - added by smallstepforman 18 months ago.
hrev56536 desktop 5600XT
syslog_hrev56554 (109.0 KB ) - added by smallstepforman 18 months ago.
syslog_hrev56554

Change History (75)

by smallstepforman, 2 years ago

Attachment: syslog.old added

by smallstepforman, 2 years ago

Attachment: syslog added

comment:1 by waddlesplash, 2 years ago

The panic message is not actually in either syslog. Please attach a photo.

comment:2 by waddlesplash, 2 years ago

"Unsupported compilation unit version" is a Debugger warning; it seems that the DWARF5 patch was lost in HaikuPorts for GCC 11. So, this may not be a KDL but an app_server crash or something like it.

comment:3 by smallstepforman, 2 years ago

I also get a libgcc_s.so.1 - unsupported compilation unit version: 5 error.

comment:4 by smallstepforman, 2 years ago

Aha, fail safe video mode allows the system to boot. I have an AMD 5600XT (Navi 1) GPU. Possibly the error is with the updates to the Radeon HD driver. I've confirmed this with 4 reboots (2x fail safe video and it boots, 2x without and it KDL)

comment:5 by smallstepforman, 2 years ago

Some more info about the card. Device: Navi 10 (Radeon RX 5600 XT) device id: 0x731f device/type 0x3 device/vendor 0x1002 pci/bus 47

comment:6 by smallstepforman, 2 years ago

Blacklisting Radeon_HD driver and the system boots normally. I've isolated the culprit. Can we change the title?

comment:7 by waddlesplash, 2 years ago

Component: System/KernelDrivers/Graphics/radeon_hd
Owner: changed from nobody to kallisti5
Platform: x86-64All
Priority: criticalhigh
Summary: libstdc++.so.6.0.29 unsupported compilation unit version: 5 KDLradeon_hd causes app_server crash on Navi

The "unsupported compilation unit" problems are tracked here: https://github.com/haikuports/haikuports/issues/6367

I think kallisti5 just blindly added this family of cards to the driver to see what would happen. It's not very surprising that it doesn't work. Probably they should be removed, then :)

comment:8 by kallisti5, 2 years ago

I think kallisti5 just blindly added this family of cards to the driver to see what would happen. It's not very surprising that it doesn't work. Probably they should be removed, then :)

Yup. I can't get my hands on any new video cards for a reasonable price, so decided to outsource testing to users lol. AtomBIOS does let us generally "cram in new PCIID's" without any additional changes... however the exception is when things change beyond what our AtomBIOS support code can handle.

Speaking of which...

KERN: radeon_hd: gpio_i2c_populate: could't read GPIO_I2C_Info table from AtomBIOS index 10!
1277	KERN: radeon_hd: connector_probe: found 136 potential display paths.
1278	KERN: radeon_hd: connector_probe: Path #2: skipping unknown connector.
1279	KERN: radeon_hd: connector_probe: TODO: Found router object?
1280	KERN: Last message repeated 862 times.
1281	KERN: vm_soft_fault: va 0xa277e82000 not covered by area in address space
1282	KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xa277e82000, ip 0x1ee99bedeca, write 0, user 1, thread 0x41d
1283	KERN: debug_server: Thread 1053 entered the debugger: Segment violation
1284	KERN: stack trace, current PC 0x1ee99bedeca  _Z15connector_probev + 0x1ba:
1285	KERN:   (0x7f8733398b50)  0x1ee99bea404  radeon_init_accelerant + 0x2b4
1286	KERN:   (0x7f8733398bb0)  0x2112577b55c  _ZN21AccelerantHWInterface15_OpenAccelerantEi + 0x21c
1287	KERN:   (0x7f87333990d0)  0x2112577b61f  _ZN21AccelerantHWInterface10InitializeEv + 0x6f
1288	KERN:   (0x7f87333990f0)  0x211257341be  _ZN13ScreenManager15_AddHWInterfaceEP11HWInterface + 0x5e
1289	KERN:   (0x7f8733399140)  0x211257344f9  _ZN13ScreenManagerC1Ev + 0x69
1290	KERN:   (0x7f87333991c0)  0x2112571868e  _ZN9AppServerC2EPi + 0xfe
1291	KERN:   (0x7f8733399200)  0x21125716f90  main + 0x30
1292	KERN:   (0x7f8733399230)  0x21125717a2e  _start + 0x3e
1293	KERN:   (0x7f8733399260)  0x104c86103b5  runtime_loader + 0x105

Ironically, this is *really* valuable. I might be able to fix that :-D

comment:9 by kallisti5, 2 years ago

ok. I have a patch here for this one: https://review.haiku-os.org/c/haiku/+/4697

smallstepforman, could you try this test image from one of these locations and let us know your results?

I *really* need the log output from it to tell how it works :-)

by smallstepforman, 2 years ago

Attachment: syslog.2 added

USB boot latest from Alex

comment:10 by smallstepforman, 2 years ago

Hi Alex. See syslog. Unfortunately, that version also goes straight to KDL after rocket icon.

comment:11 by kallisti5, 2 years ago

yeesh. ok. Thanks for testing.

Could you grab this file and attach it to the ticket? /boot/system/cache/tmp/radeon_hd_bios_1002_731f_0.bin

I need to investigate your atombios's GPIO tables to see why we're not parsing it correctly.

comment:12 by smallstepforman, 2 years ago

Hi Alex. I'm unable to see the /boot/system/cache/tmp/radeon_hd_bios_1002_731f_0.bin file. When I boot from an older working state, that file doesn't appear, and obviously I cannot boot the latest nightly or from the USB disk image unless I disable radeon_hd driver.

Kind of stuck here.

BTW - yesterday afternoon my 11 son fractured his finger on a trampoline, so had to go to hospital and today will be trying to get a hand surgeon to see my son as soon as possible, so there may be some delays responding to workarounds during the next couple of days. All should be OK later on during the week.

comment:13 by diver, 2 years ago

Blocking: 17452 added

comment:14 by smallstepforman, 2 years ago

Hi Alex. Still KDL with hrev55720. Attaching crash report in a second ...

by smallstepforman, 2 years ago

Crash report hrev55720

comment:15 by diver, 2 years ago

Summary: radeon_hd causes app_server crash on Naviradeon_hd crashes in connector_probe()

comment:16 by diver, 2 years ago

Blocking: 17473 added

comment:17 by korli, 2 years ago

I had a crash in connector_probe() because the rom area wasn't big enough. Maybe check with hrev55760 if the ACPI table is found. If it doesn't help, then the rom area should be made bigger for the PCI bar method 2.

comment:18 by smallstepforman, 2 years ago

Same issue with hrev55763

KDL in connector_probe() + 0x1d6 bt: radeon_init_accelerant + 0x279 bt: AccelerantHWInterface::_OpenAcclerant(int) + 0x222

by smallstepforman, 2 years ago

Attachment: syslog.3 added

hrev55763 syslog

comment:19 by kallisti5, 2 years ago

1412	KERN: radeon_hd: mapAtomBIOSACPI: seeking AtomBIOS from ACPI
1413	KERN: radeon_hd: radeon_hd_getbios: AtomBIOS not found using active method 0 at 0x0

Doesn't appear to have the ACPI atombios.

*however* related to what you said @korli...

1396	KERN: radeon_hd: radeon_hd_init: Error: found 0MB video ram, using PCI bar size...
1397	KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 0MB video ram

Let me check the code around detecting video ram. maybe something card generation specific is missing there.

by kallisti5, 2 years ago

fix patch v1

comment:20 by kallisti5, 2 years ago

Give the patch above a try.

The linux kernel transitioned from the radeon driver to amdgpu around Tahiti. The amdgpu driver does everything differently making following the two code cases a little rough.

*IF* the patch doesn't work, please grab the logs (especially logs around the "mapping a frame buffer of XXX MB".

After grabbing the first set of logs, adjust the following line:

#define SI_CONFIG_MEMSIZE_TAHITI                                       0x0de3  // or 0x378c?

to

#define SI_CONFIG_MEMSIZE_TAHITI                                       0x378c

The end goal is the following messages in syslog need to represent the correct amount of video ram on your card:

1396	KERN: radeon_hd: radeon_hd_init: Error: found 0MB video ram, using PCI bar size...
1397	KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 0MB video ram

If the found 0MB of video ram goes away, it means we found the potentially correct solution that needs to be tested on more cards.

Sorry for all the manual work. I don't have anything newer than a RX 480 to test on locally.

comment:21 by kallisti5, 2 years ago

Oh.. and the two different values are linux defining the MEMSIZE register as RREG32(SI_CONFIG_MEMSIZE_TAHITI).. but then I noticed that the amdgpu version of RREG32 multiplies the register by a seemingly random 4 :-|

comment:22 by kallisti5, 2 years ago

I pushed this here as well for tracking: https://review.haiku-os.org/c/haiku/+/4848

The silly uint64->uint32 conversions aren't needed except for the case added.

comment:23 by kallisti5, 2 years ago

to compare, here are the relevant lines from my RX 480 without this patch.

KERN: radeon_hd: radeon_hd_init: card(0): Radeon Polaris 10 1002:67DF
KERN: radeon_hd: radeon_hd_init: shrinking frame buffer to PCI bar...
KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 8192MB video ram

Technically my Polaris is newer than Tahiti but maybe the availability of the referenced register becomes unreliable or something after Tahiti.. who knows.

comment:24 by smallstepforman, 2 years ago

Hi Alex. I do not have a working Haiku build environment, is there any chance that you can upload a prebuilt image with the patch? I've also got access to a laptop with Radeon RX6600M which should allow me to test it on Navi2.

comment:27 by smallstepforman, 2 years ago

Hi Jerome and Alex. I downloaded the ISO from 6 Jan 2022. Same crash in connector_probe +0x1d6.

However, this time I have a few interesting files. Uploading the following attachements: app_server crash report app_server core file radeon_hd_bios_1002_731f_0.bin syslog.

Stay tuned for upload.

by smallstepforman, 2 years ago

app_server crash report

by smallstepforman, 2 years ago

app_server core file (zip)

by smallstepforman, 2 years ago

Radeon BIOS

by smallstepforman, 2 years ago

Attachment: syslog_06Jan2022 added

Syslog 06Jan2022

by smallstepforman, 2 years ago

Radeon RX6600M bios

by smallstepforman, 2 years ago

Attachment: syslog_RX6600M added

Syslog_RX6600M

comment:28 by smallstepforman, 2 years ago

I've also tested the image 06Jan2022 with HP Omen 16 laptop (AMD advantage edition), which has a mobile Radeon RX6600M (NAVI 2 generation). During boot, after the rocket I just get a black screen. Disabling Radeon_HD driver and I can boot to desktop. See attached syslog_RX6600M and radeon_hd_bios_Rx6600M for output from this laptop (Navi 2).

Just a reminder that the original issue (KDL) is on Radeon Navi 1 (5600XT).

comment:29 by korli, 2 years ago

Second image to test: https://haiku.movingborders.es/testbuild/Id4803d5c5a8bcab685f687c6af0292c945813ec6/1/hrev55774/x86_64/

The RX6600M syslog looks like #17516. Did this HP Omen 16 laptop ever work with radeon_hd?

comment:30 by smallstepforman, 2 years ago

Hi Jerome. This version (11Jan2022) no longer goes to KDL (an improvement), now I'm stuck on a black screen after the rocket icon. This is on the 5600XT (Navi1). The monitor attached to the DGPU goes into power save mode (no signal?).

I've also tried this version the 6600M laptop (Navi2), that never had the KDL but also has black screen after rocket, however with a laptop you cannot see display indicator so I'm assuming it has the same issue (no signal).

Attaching syslog_11Jan2022.

by smallstepforman, 2 years ago

Attachment: syslog_11Jan2022 added

Syslog 11 Jan 2022 (5600XT desktop)

comment:31 by smallstepforman, 2 years ago

Regarding the query: The RX6600M syslog looks like #17516. Did this HP Omen 16 laptop ever work with radeon_hd?

I wouldn't know, I just purchased the laptop a couple of weeks ago. It has another problem (14 minute boot time while scanning nvme drives), after that it boots with fail safe video.

comment:32 by korli, 2 years ago

Looks indeed better. Applied in hrev55780. Now we're back to #17516.

comment:33 by kallisti5, 2 years ago

so is this one resolved after hrev55780?

comment:34 by korli, 2 years ago

Third image to test: https://haiku.movingborders.es/testbuild/Iac1b5870951d2f982002de65453959cfc38e79cc/2/hrev55780/x86_64/

@kallisti5 it's the same stand as #17516, it crashes a bit further.

comment:35 by smallstepforman, 2 years ago

Hi Korli. No difference with version from 12Jan2022. No KDL, however black screen and monitor goes into power save mode. Attaching syslog.

by smallstepforman, 2 years ago

Attachment: syslog_12Jan2022 added

Syslog_12Jan2022 5600XT

comment:36 by korli, 2 years ago

@kallisti5 please check the syslog with https://review.haiku-os.org/c/haiku/+/4848

radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 4194303MB video ram

Doesn't look good.

Last edited 2 years ago by korli (previous) (diff)

comment:37 by smallstepforman, 2 years ago

Korli, good observation regarding 4,194,304Mb. That number is (232) / 1024, so looks like a -1 somewhere. The Radeon 5600XT has 6Gb or RAM.

comment:38 by Cacodemon345, 2 years ago

AMD Radeon user here.

As of hrev55923, radeon_hd crashes in connector_probe when booting with CSM disabled. When booting with CSM enabled it boots successfully to the desktop.

comment:39 by waddlesplash, 2 years ago

Any change after hrev55947?

comment:40 by smallstepforman, 2 years ago

Hi Augustin. With Radeon6600M, still boots to black screen after rocket icon. Attaching syslog. hrev55954 Still need to blacklist radeon_hd on this laptop.

by smallstepforman, 2 years ago

Attachment: syslog.2.old added

Syslog hrev55954 Radeon 6600M

by smallstepforman, 2 years ago

Attachment: syslog.3.old added

syslog for hrev55954 Radeon 5600XT

comment:41 by smallstepforman, 2 years ago

Also tested hrev55954 with Radeon 5600XT, and after the rocket icon it goes black, external monitor goes to power save mode. Need to blacklist radeon_hd driver.

comment:42 by kallisti5, 2 years ago

none of the syslogs included show the logs with that patch above.

I added in comments: "#define SI_CONFIG_MEMSIZE_TAHITI 0x0de3 or 0x378c?"

If it doesn't work with 0x0de3 and you get 4194303MB, could you try 0x378c there?

I can't get any Radeon XT cards here since they cost almost $1,000 USD still :-)

by smallstepforman, 2 years ago

Attachment: syslog.4.old added

Hopefully correct syslog from hrev55954 (Radeon 5600XT)

by smallstepforman, 2 years ago

Attachment: syslog.5.old added

Second attempt syslog hrev55954 Radeon RX6600M

comment:43 by diver, 21 months ago

Blocking: 17853 added

comment:44 by kallisti5, 19 months ago

@smallstepforman could you try again with the latest nightly build on that RX 6600M ?

I noticed this error in your logs:

KERN: radeon_hd: detect_displays: ERROR: 0 attached monitors were found on display connectors. Injecting first connector as a last resort.

The logic changed here where the driver will attempt to failback to vesa/framebuffer if it hits this (and other) situations. It should make everything a lot easier to test at minimum :-)

https://cgit.haiku-os.org/haiku/commit/src/add-ons/accelerants/radeon_hd?id=65462c8c81eeb18ab5e946bfd8f10150fa27ac0b

comment:45 by smallstepforman, 18 months ago

I'm sorry to disappoint, but with hrev56536, not only will it stop at the rocket icon, but the laptop fans will ramp up indicating that something is pegging the CPU (infinite loop?). After 5 minutes of waiting, I had to reboot. Attaching syslog. I still have to blacklist radeon_hd to boot on this laptop (Navi R6600M). Later today I will also test on Radeon 5600XT.

by smallstepforman, 18 months ago

Attachment: syslog.4 added

Syslog hrev56536

comment:46 by smallstepforman, 18 months ago

For the Desktop 5600XT, I get KDL with hrev56536.

PANIC: acquire_spinlock(): Failed to acquire spinlock 0x.. for a long time (last caller: 0x000000000 value: 1)

syslog_sender(+0xe6).

Also causes fan to kick in. Spinning, spinning, and warming up my room :) Will attach syslog shortly.

by smallstepforman, 18 months ago

Attachment: syslog.5 added

hrev56536 desktop 5600XT

comment:47 by waddlesplash, 18 months ago

The panic does not show up in the syslog, please post a screenshot.

comment:48 by waddlesplash, 18 months ago

Also, please test with "Enable onscreen debug output" and "Disable SMP", you might see some message in the log.

When booting with CSM enabled it boots successfully to the desktop.

That would potentially indicate a failure to load the AtomBIOS. Cacodemon345, can you test with a recent nightly?

comment:49 by waddlesplash, 18 months ago

I merged a few changes of relevance in hrev56553.

comment:50 by smallstepforman, 18 months ago

Attempt with hrev56554 (with new Radeon changes by Augustin). Test on Rx6600M (Navi 2)

Normal boot goes to KDL at rocket icon. When disabling SMP, I get to see a repeat of the following warnings:

radeon_hd: dp_aux_speak: dp_aux_channel flags not zero! radeon_hd: WARNING: CHECK NEW DCE mmDC_GPIO_HPD_A value! ... radeon_hd: dp_aux_transaction: IO Error. 7 attempts

The stack trace after KDL

comment:51 by smallstepforman, 18 months ago

Another on screen error:

radeon_hd: dp_aux_get_i2c_byte: aux_ch transaction failed! radeon_hd: ddc2_dp_read_edid1: error reading EDID data at index 0, result = 0x80000001

And after 5 minutes (SMP disabled) it actually booted to desktop. Attaching syslog.

by smallstepforman, 18 months ago

Attachment: syslog_hrev56554 added

syslog_hrev56554

comment:52 by diver, 17 months ago

Blocking: 18080 added

comment:53 by diver, 10 months ago

Blocking: 18505 added

comment:54 by kallisti5, 6 months ago

Ok. I have a Radeon 6700 XT, and some monitors that won't do native resolution on EFI Framebuffer. It runs into a similar crash.

I guess I need to actually try and solve this one :-)

6700 XT - 1002:73df -  	{0x73df, 13, 5, RADEON_NAVI, CHIP_STD, "Radeon RX 6700 XT"}
Note: See TracTickets for help on using tickets.