Opened 3 years ago
Last modified 4 months ago
#17377 new bug
radeon_hd crashes in connector_probe()
Reported by: | smallstepforman | Owned by: | kallisti5 |
---|---|---|---|
Priority: | high | Milestone: | Unscheduled |
Component: | Drivers/Graphics/radeon_hd | Version: | R1/beta3 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #17452, #17473, #17853, #18080, #18505 | |
Platform: | All |
Description
The previous working version of Haiku x64 nightly was hrev55399 (2021 Sep 8) The next broken version is Haiku x64 nightly hrev 55614 (2021 Nov 6)
KDL during boot, after rocket icon.
libstdc++.so.6.0.29 unsupported compilation unit version: 5 This may be due to switch to gcc 11.
Error on Ryzen 3700, nvme, MSI Tomahawk x570, 32Gb RAM, video AMD 5600XT However, on MacBook Pro 11.3 (2014), i7 4th gen, nVidia 750M, 16Gb, it boots successfully
Attaching syslog.old (boot from hrev55614, broken) and syslog (boot hrev55399, working)
Attachments (21)
Change History (76)
by , 3 years ago
Attachment: | syslog.old added |
---|
by , 3 years ago
comment:1 by , 3 years ago
comment:2 by , 3 years ago
"Unsupported compilation unit version" is a Debugger warning; it seems that the DWARF5 patch was lost in HaikuPorts for GCC 11. So, this may not be a KDL but an app_server crash or something like it.
comment:3 by , 3 years ago
I also get a libgcc_s.so.1 - unsupported compilation unit version: 5 error.
comment:4 by , 3 years ago
Aha, fail safe video mode allows the system to boot. I have an AMD 5600XT (Navi 1) GPU. Possibly the error is with the updates to the Radeon HD driver. I've confirmed this with 4 reboots (2x fail safe video and it boots, 2x without and it KDL)
comment:5 by , 3 years ago
Some more info about the card. Device: Navi 10 (Radeon RX 5600 XT) device id: 0x731f device/type 0x3 device/vendor 0x1002 pci/bus 47
comment:6 by , 3 years ago
Blacklisting Radeon_HD driver and the system boots normally. I've isolated the culprit. Can we change the title?
comment:7 by , 3 years ago
Component: | System/Kernel → Drivers/Graphics/radeon_hd |
---|---|
Owner: | changed from | to
Platform: | x86-64 → All |
Priority: | critical → high |
Summary: | libstdc++.so.6.0.29 unsupported compilation unit version: 5 KDL → radeon_hd causes app_server crash on Navi |
The "unsupported compilation unit" problems are tracked here: https://github.com/haikuports/haikuports/issues/6367
I think kallisti5 just blindly added this family of cards to the driver to see what would happen. It's not very surprising that it doesn't work. Probably they should be removed, then :)
comment:8 by , 3 years ago
I think kallisti5 just blindly added this family of cards to the driver to see what would happen. It's not very surprising that it doesn't work. Probably they should be removed, then :)
Yup. I can't get my hands on any new video cards for a reasonable price, so decided to outsource testing to users lol. AtomBIOS does let us generally "cram in new PCIID's" without any additional changes... however the exception is when things change beyond what our AtomBIOS support code can handle.
Speaking of which...
KERN: radeon_hd: gpio_i2c_populate: could't read GPIO_I2C_Info table from AtomBIOS index 10! 1277 KERN: radeon_hd: connector_probe: found 136 potential display paths. 1278 KERN: radeon_hd: connector_probe: Path #2: skipping unknown connector. 1279 KERN: radeon_hd: connector_probe: TODO: Found router object? 1280 KERN: Last message repeated 862 times. 1281 KERN: vm_soft_fault: va 0xa277e82000 not covered by area in address space 1282 KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xa277e82000, ip 0x1ee99bedeca, write 0, user 1, thread 0x41d 1283 KERN: debug_server: Thread 1053 entered the debugger: Segment violation 1284 KERN: stack trace, current PC 0x1ee99bedeca _Z15connector_probev + 0x1ba: 1285 KERN: (0x7f8733398b50) 0x1ee99bea404 radeon_init_accelerant + 0x2b4 1286 KERN: (0x7f8733398bb0) 0x2112577b55c _ZN21AccelerantHWInterface15_OpenAccelerantEi + 0x21c 1287 KERN: (0x7f87333990d0) 0x2112577b61f _ZN21AccelerantHWInterface10InitializeEv + 0x6f 1288 KERN: (0x7f87333990f0) 0x211257341be _ZN13ScreenManager15_AddHWInterfaceEP11HWInterface + 0x5e 1289 KERN: (0x7f8733399140) 0x211257344f9 _ZN13ScreenManagerC1Ev + 0x69 1290 KERN: (0x7f87333991c0) 0x2112571868e _ZN9AppServerC2EPi + 0xfe 1291 KERN: (0x7f8733399200) 0x21125716f90 main + 0x30 1292 KERN: (0x7f8733399230) 0x21125717a2e _start + 0x3e 1293 KERN: (0x7f8733399260) 0x104c86103b5 runtime_loader + 0x105
Ironically, this is *really* valuable. I might be able to fix that :-D
comment:9 by , 3 years ago
ok. I have a patch here for this one: https://review.haiku-os.org/c/haiku/+/4697
smallstepforman, could you try this test image from one of these locations and let us know your results?
- https://ipfs.io/ipns/us.hpkg.haiku-os.org/testing/trac_17377/haiku-nightly-anyboot.iso.gz
- https://keybase.pub/kallisti5/test_images/
I *really* need the log output from it to tell how it works :-)
comment:10 by , 3 years ago
Hi Alex. See syslog. Unfortunately, that version also goes straight to KDL after rocket icon.
comment:11 by , 3 years ago
yeesh. ok. Thanks for testing.
Could you grab this file and attach it to the ticket? /boot/system/cache/tmp/radeon_hd_bios_1002_731f_0.bin
I need to investigate your atombios's GPIO tables to see why we're not parsing it correctly.
comment:12 by , 3 years ago
Hi Alex. I'm unable to see the /boot/system/cache/tmp/radeon_hd_bios_1002_731f_0.bin file. When I boot from an older working state, that file doesn't appear, and obviously I cannot boot the latest nightly or from the USB disk image unless I disable radeon_hd driver.
Kind of stuck here.
BTW - yesterday afternoon my 11 son fractured his finger on a trampoline, so had to go to hospital and today will be trying to get a hand surgeon to see my son as soon as possible, so there may be some delays responding to workarounds during the next couple of days. All should be OK later on during the week.
comment:13 by , 3 years ago
Blocking: | 17452 added |
---|
comment:14 by , 3 years ago
Hi Alex. Still KDL with hrev55720. Attaching crash report in a second ...
by , 3 years ago
Attachment: | app_server-1054-debug-12-12-2021-23-07-13.report added |
---|
Crash report hrev55720
comment:15 by , 3 years ago
Summary: | radeon_hd causes app_server crash on Navi → radeon_hd crashes in connector_probe() |
---|
comment:16 by , 3 years ago
Blocking: | 17473 added |
---|
comment:17 by , 3 years ago
I had a crash in connector_probe() because the rom area wasn't big enough. Maybe check with hrev55760 if the ACPI table is found. If it doesn't help, then the rom area should be made bigger for the PCI bar method 2.
comment:18 by , 3 years ago
Same issue with hrev55763
KDL in connector_probe() + 0x1d6 bt: radeon_init_accelerant + 0x279 bt: AccelerantHWInterface::_OpenAcclerant(int) + 0x222
comment:19 by , 3 years ago
1412 KERN: radeon_hd: mapAtomBIOSACPI: seeking AtomBIOS from ACPI 1413 KERN: radeon_hd: radeon_hd_getbios: AtomBIOS not found using active method 0 at 0x0
Doesn't appear to have the ACPI atombios.
*however* related to what you said @korli...
1396 KERN: radeon_hd: radeon_hd_init: Error: found 0MB video ram, using PCI bar size... 1397 KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 0MB video ram
Let me check the code around detecting video ram. maybe something card generation specific is missing there.
by , 3 years ago
Attachment: | 0001-radeon_hd-Fix-memory-detection-on-newer-cards.patch added |
---|
fix patch v1
comment:20 by , 3 years ago
Give the patch above a try.
The linux kernel transitioned from the radeon driver to amdgpu around Tahiti. The amdgpu driver does everything differently making following the two code cases a little rough.
*IF* the patch doesn't work, please grab the logs (especially logs around the "mapping a frame buffer of XXX MB".
After grabbing the first set of logs, adjust the following line:
#define SI_CONFIG_MEMSIZE_TAHITI 0x0de3 // or 0x378c?
to
#define SI_CONFIG_MEMSIZE_TAHITI 0x378c
The end goal is the following messages in syslog need to represent the correct amount of video ram on your card:
1396 KERN: radeon_hd: radeon_hd_init: Error: found 0MB video ram, using PCI bar size... 1397 KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 0MB video ram
If the found 0MB of video ram goes away, it means we found the potentially correct solution that needs to be tested on more cards.
Sorry for all the manual work. I don't have anything newer than a RX 480 to test on locally.
comment:21 by , 3 years ago
Oh.. and the two different values are linux defining the MEMSIZE register as RREG32(SI_CONFIG_MEMSIZE_TAHITI).. but then I noticed that the amdgpu version of RREG32 multiplies the register by a seemingly random 4 :-|
comment:22 by , 3 years ago
I pushed this here as well for tracking: https://review.haiku-os.org/c/haiku/+/4848
The silly uint64->uint32 conversions aren't needed except for the case added.
comment:23 by , 3 years ago
to compare, here are the relevant lines from my RX 480 without this patch.
KERN: radeon_hd: radeon_hd_init: card(0): Radeon Polaris 10 1002:67DF KERN: radeon_hd: radeon_hd_init: shrinking frame buffer to PCI bar... KERN: radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 8192MB video ram
Technically my Polaris is newer than Tahiti but maybe the availability of the referenced register becomes unreliable or something after Tahiti.. who knows.
comment:24 by , 3 years ago
Hi Alex. I do not have a working Haiku build environment, is there any chance that you can upload a prebuilt image with the patch? I've also got access to a laptop with Radeon RX6600M which should allow me to test it on Navi2.
comment:25 by , 3 years ago
comment:27 by , 3 years ago
Hi Jerome and Alex. I downloaded the ISO from 6 Jan 2022. Same crash in connector_probe +0x1d6.
However, this time I have a few interesting files. Uploading the following attachements: app_server crash report app_server core file radeon_hd_bios_1002_731f_0.bin syslog.
Stay tuned for upload.
by , 3 years ago
Attachment: | app_server-1050-debug-11-01-2022-08-53-49.report added |
---|
app_server crash report
by , 3 years ago
Attachment: | app_server-1050-debug-11-01-2022-08-54-17.core.zip added |
---|
app_server core file (zip)
comment:28 by , 3 years ago
I've also tested the image 06Jan2022 with HP Omen 16 laptop (AMD advantage edition), which has a mobile Radeon RX6600M (NAVI 2 generation). During boot, after the rocket I just get a black screen. Disabling Radeon_HD driver and I can boot to desktop. See attached syslog_RX6600M and radeon_hd_bios_Rx6600M for output from this laptop (Navi 2).
Just a reminder that the original issue (KDL) is on Radeon Navi 1 (5600XT).
comment:29 by , 3 years ago
Second image to test: https://haiku.movingborders.es/testbuild/Id4803d5c5a8bcab685f687c6af0292c945813ec6/1/hrev55774/x86_64/
The RX6600M syslog looks like #17516. Did this HP Omen 16 laptop ever work with radeon_hd?
comment:30 by , 3 years ago
Hi Jerome. This version (11Jan2022) no longer goes to KDL (an improvement), now I'm stuck on a black screen after the rocket icon. This is on the 5600XT (Navi1). The monitor attached to the DGPU goes into power save mode (no signal?).
I've also tried this version the 6600M laptop (Navi2), that never had the KDL but also has black screen after rocket, however with a laptop you cannot see display indicator so I'm assuming it has the same issue (no signal).
Attaching syslog_11Jan2022.
comment:31 by , 3 years ago
Regarding the query: The RX6600M syslog looks like #17516. Did this HP Omen 16 laptop ever work with radeon_hd?
I wouldn't know, I just purchased the laptop a couple of weeks ago. It has another problem (14 minute boot time while scanning nvme drives), after that it boots with fail safe video.
comment:34 by , 3 years ago
Third image to test: https://haiku.movingborders.es/testbuild/Iac1b5870951d2f982002de65453959cfc38e79cc/2/hrev55780/x86_64/
@kallisti5 it's the same stand as #17516, it crashes a bit further.
comment:35 by , 3 years ago
Hi Korli. No difference with version from 12Jan2022. No KDL, however black screen and monitor goes into power save mode. Attaching syslog.
comment:36 by , 3 years ago
@kallisti5 please check the syslog with https://review.haiku-os.org/c/haiku/+/4848
radeon_hd: radeon_hd_init: mapping a frame buffer of 256MB out of 4194303MB video ram
Doesn't look good.
comment:37 by , 3 years ago
Korli, good observation regarding 4,194,304Mb. That number is (232) / 1024, so looks like a -1 somewhere. The Radeon 5600XT has 6Gb or RAM.
comment:38 by , 3 years ago
AMD Radeon user here.
As of hrev55923, radeon_hd crashes in connector_probe when booting with CSM disabled. When booting with CSM enabled it boots successfully to the desktop.
comment:40 by , 3 years ago
Hi Augustin. With Radeon6600M, still boots to black screen after rocket icon. Attaching syslog. hrev55954 Still need to blacklist radeon_hd on this laptop.
comment:41 by , 3 years ago
Also tested hrev55954 with Radeon 5600XT, and after the rocket icon it goes black, external monitor goes to power save mode. Need to blacklist radeon_hd driver.
comment:42 by , 3 years ago
none of the syslogs included show the logs with that patch above.
I added in comments: "#define SI_CONFIG_MEMSIZE_TAHITI 0x0de3 or 0x378c?"
If it doesn't work with 0x0de3 and you get 4194303MB, could you try 0x378c there?
I can't get any Radeon XT cards here since they cost almost $1,000 USD still :-)
by , 3 years ago
Attachment: | syslog.4.old added |
---|
Hopefully correct syslog from hrev55954 (Radeon 5600XT)
comment:43 by , 2 years ago
Blocking: | 17853 added |
---|
comment:44 by , 2 years ago
@smallstepforman could you try again with the latest nightly build on that RX 6600M ?
I noticed this error in your logs:
KERN: radeon_hd: detect_displays: ERROR: 0 attached monitors were found on display connectors. Injecting first connector as a last resort.
The logic changed here where the driver will attempt to failback to vesa/framebuffer if it hits this (and other) situations. It should make everything a lot easier to test at minimum :-)
comment:45 by , 2 years ago
I'm sorry to disappoint, but with hrev56536, not only will it stop at the rocket icon, but the laptop fans will ramp up indicating that something is pegging the CPU (infinite loop?). After 5 minutes of waiting, I had to reboot. Attaching syslog. I still have to blacklist radeon_hd to boot on this laptop (Navi R6600M). Later today I will also test on Radeon 5600XT.
comment:46 by , 2 years ago
For the Desktop 5600XT, I get KDL with hrev56536.
PANIC: acquire_spinlock(): Failed to acquire spinlock 0x.. for a long time (last caller: 0x000000000 value: 1)
syslog_sender(+0xe6).
Also causes fan to kick in. Spinning, spinning, and warming up my room :) Will attach syslog shortly.
comment:48 by , 2 years ago
Also, please test with "Enable onscreen debug output" and "Disable SMP", you might see some message in the log.
When booting with CSM enabled it boots successfully to the desktop.
That would potentially indicate a failure to load the AtomBIOS. Cacodemon345, can you test with a recent nightly?
comment:50 by , 2 years ago
Attempt with hrev56554 (with new Radeon changes by Augustin). Test on Rx6600M (Navi 2)
Normal boot goes to KDL at rocket icon. When disabling SMP, I get to see a repeat of the following warnings:
radeon_hd: dp_aux_speak: dp_aux_channel flags not zero! radeon_hd: WARNING: CHECK NEW DCE mmDC_GPIO_HPD_A value! ... radeon_hd: dp_aux_transaction: IO Error. 7 attempts
The stack trace after KDL
comment:51 by , 2 years ago
Another on screen error:
radeon_hd: dp_aux_get_i2c_byte: aux_ch transaction failed! radeon_hd: ddc2_dp_read_edid1: error reading EDID data at index 0, result = 0x80000001
And after 5 minutes (SMP disabled) it actually booted to desktop. Attaching syslog.
comment:52 by , 2 years ago
Blocking: | 18080 added |
---|
comment:53 by , 18 months ago
Blocking: | 18505 added |
---|
comment:54 by , 15 months ago
Ok. I have a Radeon 6700 XT, and some monitors that won't do native resolution on EFI Framebuffer. It runs into a similar crash.
I guess I need to actually try and solve this one :-)
6700 XT - 1002:73df - {0x73df, 13, 5, RADEON_NAVI, CHIP_STD, "Radeon RX 6700 XT"}
comment:55 by , 4 months ago
There is a similar problem on Steam Deck APU (Van Gogh):
KERN: radeon_hd: card(0): radeon_hd_getbios: called KERN: radeon_hd: mapAtomBIOSACPI: seeking AtomBIOS from ACPI KERN: radeon_hd: mapAtomBIOSACPI: ACPI VFCT contains a BIOS for: 4:0:0 1002:163f KERN: radeon_hd: mapAtomBIOSACPI: AtomBIOS verified and locked (45056) KERN: radeon_hd: radeon_hd_getbios: AtomBIOS found using active method 0 at 0x0 KERN: radeon_hd: card(0): radeon_hd_init found VESA EDID information. KERN: radeon_hd: card(0): radeon_hd_init completed successfully! KERN: radeon_hd: card(0): GPU thermal status: 0C KERN: radeon_hd: device_ioctl: accelerant: radeon_hd.accelerant KERN: radeon_hd: radeon_init_accelerant enter KERN: radeon_hd: radeon_dump_bios: Dumping AtomBIOS as ATOM_DEBUG is set... KERN: radeon_hd: radeon_dump_bios: AtomBIOS dumped to /boot/system/cache/tmp/radeon_hd_bios_1002_163f_0.bin KERN: radeon_hd: radeon_init_bios: AtomBIOS is already posted KERN: radeon_hd: radeon_gpu_probe: table 3.4 KERN: radeon_hd: gpio_general_populate: general GPIO @ 0, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 1, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 2, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 3, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 4, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 5, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 6, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 7, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 8, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 9, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 10, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 11, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 12, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 13, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 14, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_general_populate: general GPIO @ 15, valid: true, hwPin: 0x0 KERN: radeon_hd: gpio_i2c_populate: could't read GPIO_I2C_Info table from AtomBIOS index 10! KERN: radeon_hd: connector_probe: found 72 potential display paths. KERN: radeon_hd: connector_probe: Path #1: Unknown connector object ID! KERN: radeon_hd: connector_probe: Path #2: Unknown connector object ID! KERN: vm_soft_fault: va 0x15f7a9dc000 not covered by area in address space KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x15f7a9dc8b9, ip 0xb79c244e56, write 0, user 1, exec 0, thread 0xf8 KERN: debug_server: Thread 248 entered the debugger: Segment violation KERN: stack trace, current PC 0xb79c244e56 </boot/system/add-ons/accelerants/radeon_hd.accelerant> _Z15connector_probev + 0x126: KERN: (0x7fca5fdfb8c0) 0xb79c2416c6 </boot/system/add-ons/accelerants/radeon_hd.accelerant> radeon_init_accelerant + 0x266 KERN: (0x7fca5fdfb920) 0xe30971fcfc </boot/system/servers/app_server> _ZN21AccelerantHWInterface15_OpenAccelerantEi + 0x1bc KERN: (0x7fca5fdfbe40) 0xe30971fe0f </boot/system/servers/app_server> _ZN21AccelerantHWInterface10InitializeEv + 0x5f KERN: (0x7fca5fdfbe60) 0xe3096daa38 </boot/system/servers/app_server> _ZN13ScreenManager15_AddHWInterfaceEP11HWInterface + 0x58 KERN: (0x7fca5fdfbea0) 0xe3096dad99 </boot/system/servers/app_server> _ZN13ScreenManagerC1Ev + 0x69 KERN: (0x7fca5fdfbf20) 0xe3096bff90 </boot/system/servers/app_server> _ZN9AppServerC2EPi + 0x100 KERN: (0x7fca5fdfbf60) 0xe3096bea90 </boot/system/servers/app_server> main + 0x30 KERN: (0x7fca5fdfbf90) 0xe3096bf29e </boot/system/servers/app_server> _start + 0x3e KERN: (0x7fca5fdfbfc0) 0xf29463ee85 </boot/system/runtime_loader> runtime_loader + 0x115
Already submitted a request to disable radeon_hd on Van Gogh: https://review.haiku-os.org/c/haiku/+/8043
Used hrev57982 with this patch to workaround #18561
The panic message is not actually in either syslog. Please attach a photo.