Opened 8 years ago

Closed 7 years ago

Last modified 4 years ago

#12671 closed bug (not reproducible)

x86_64 trampoline boot failure

Reported by: kallisti5 Owned by: axeld
Priority: high Milestone:
Component: System/Boot Loader Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: x86-64

Description

Recent hrev50126 build, system randomly fails to boot from usb stick. Early boot shows lots of trampolining.

Attachments (7)

trampoline-boot.txt (292.8 KB ) - added by kallisti5 8 years ago.
hrev50126
trampoline-boot2.txt (8.0 KB ) - added by kallisti5 8 years ago.
better example, syslog shows this followed by a reboot.
smp1_warm_late (193.0 KB ) - added by kallisti5 8 years ago.
smp2_warm_late (193.4 KB ) - added by kallisti5 8 years ago.
smp3_cold_late (153.7 KB ) - added by kallisti5 8 years ago.
smp4_cold_late (193.5 KB ) - added by kallisti5 8 years ago.
smp5_warm_early (11.0 KB ) - added by kallisti5 8 years ago.

Download all attachments as: .zip

Change History (21)

by kallisti5, 8 years ago

Attachment: trampoline-boot.txt added

by kallisti5, 8 years ago

Attachment: trampoline-boot2.txt added

better example, syslog shows this followed by a reboot.

comment:1 by kallisti5, 8 years ago

Milestone: UnscheduledR1/beta1
Priority: normalblocker

reboot doesn't happen on hrev50109. Seems related to the gcc5 transition.

  • hrev50109 6 boots. 6 successful.
  • hrev50126 6 boots. 5 early reboot (trampoline-boot2.txt), 1 non-boot lockup (trampoline-boot.txt)

comment:2 by kallisti5, 8 years ago

Non-working build built with fresh btrev43115.

I'm working on a x86 test to see it it is limited to x86_64. Two known-good USB drives attempted with same results on x86_64

comment:3 by korli, 8 years ago

I booted successfully hrev50120 and hrev50126, for x86 and x86_64, on an Asus Zenbook UX31E-RY029V.

comment:4 by kallisti5, 8 years ago

Strange. Maybe another AMD-only bug? This is an FX-8320 on a Gigabyte 990FXA-UD5.

comment:5 by kallisti5, 8 years ago

Milestone: R1/beta1R2
Priority: blockercritical

Still haven't had time to test x86 on AMD. Since R1 is still targeted to gcc2 and this should only be an issue on gcc5 systems i'm removing the blocker status.

comment:6 by kallisti5, 8 years ago

Priority: criticalhigh

Issue still exists on hrev50383 x86_64

The os will boot successfully roughly 1 of 6 times with smp enabled. os will boot 10 of 10 times with smp disabled.

This mainboard has always been a bit odd... linux kernel upgrades have issues once and a while as well.

I'm going to purchase a new mainboard, and see if the issue still exists with the same CPU. Until validated on multiple mainboards around same chipset, going to drop prio.

comment:7 by kallisti5, 8 years ago

So, I logged 6 boots.

  • smp1 - Warm boot, hard lockup when smp in use (opening terminal)
  • smp2 - Warm boot, hard lockup when smp in use (opening termnal)
  • smp3 - Cold boot, hard lockup when setting video mode (not radeon_hd related)
  • smp4 - Cold boot, hard lockup when running multiple threads
  • smp5 - Warm boot, reboot right after kernel load while trampolining cpus

I still think we have some SMP deadlock somewhere, and given the *early* reboots it seems like something isn't 100% right early in the SMP boot process.

by kallisti5, 8 years ago

Attachment: smp1_warm_late added

by kallisti5, 8 years ago

Attachment: smp2_warm_late added

by kallisti5, 8 years ago

Attachment: smp3_cold_late added

by kallisti5, 8 years ago

Attachment: smp4_cold_late added

by kallisti5, 8 years ago

Attachment: smp5_warm_early added

comment:8 by kallisti5, 8 years ago

The risk of a hard lock up seems proportional to the number of CPU's on the system. If I get through a boot to the desktop and disable all cores except one live from ProcessController, I can load all the cores up and not start running into hard lockups until 5 cores are enabled.

comment:9 by kallisti5, 8 years ago

Ran a complete x86_64 Memtest86+ pass with SMP enabled just to be safe. No issues detected.

comment:10 by kallisti5, 8 years ago

So after several passes, I got a few memtest+ errors. Let me replace the memory and re-run test. Until then don't waste your time on this one :-)

comment:11 by kallisti5, 7 years ago

I'm closing this one. We have some pretty serious usb 3.0 issues currently on real hardware. The machine I saw this on did end up having a bad stick of ram... so closing. If anyone else sees it please feel free to reopen.

comment:12 by kallisti5, 7 years ago

Resolution: not reproducible
Status: newclosed

comment:13 by pulkomandy, 5 years ago

Milestone: R2Unscheduled

comment:14 by nielx, 4 years ago

Milestone: Unscheduled

Remove milestone for tickets with status = closed and resolution != fixed

Note: See TracTickets for help on using tickets.