Opened 9 months ago

Closed 8 months ago

#18541 closed bug (duplicate)

System hangs on rocket icon after upgrade from hrev517193 to hrev57199

Reported by: Starcrasher Owned by: nobody
Priority: high Milestone: R1/beta5
Component: Drivers/Network/ipro1000 Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking: #18593
Platform: All

Description

After updating my 64 bit nightly VM (QEmu+KVM) from hrev517193 to hrev57199, the system hangs after ipro1000 init Here are the last lines in the console.

pci_reserve_device(0, 3, 0, ipro1000)
if_initname(0xffffffff8986f800, em, 24)
[ipro1000] ipro1000: /dev/net/ipro1000/0
[ipro1000] (em) attach_pre capping queues at 1
[ipro1000] (em) bus_alloc_resource(3, [16], 0x0, 0xffffffffffffffff, 0x1,0x2)
set MTRRs to:
  mtrr:  0: base: 0x7ffe0000, size:    0x20000, type: 0
  mtrr:  1: base: 0xf8000000, size:  0x8000000, type: 0
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1
[ipro1000] (em) bus_alloc_resource(4, [20], 0x0, 0xffffffffffffffff, 0x1,0x2)
[ipro1000] (em) Using 1024 TX descriptors and 1024 RX descriptors
[ipro1000] (em) allocated for 1 tx_queues
[ipro1000] (em) allocated for 1 rx_queues
[ipro1000] (em) bus_alloc_resource(1, [0], 0x0, 0xffffffffffffffff, 0x1,0x6)
if_attach 0xffffffff8945fd20

Attachments (3)

syslog-57193.txt (174.5 KB ) - added by Starcrasher 9 months ago.
Syslog when it's working (hrev57193)
Config-xml-Qemu.txt (3.5 KB ) - added by Starcrasher 9 months ago.
XML config file of the VM in qemu GUI
Config-xml-Qemu-network.txt (492 bytes ) - added by Starcrasher 9 months ago.
XML config file of the network in qemu GUI

Download all attachments as: .zip

Change History (29)

by Starcrasher, 9 months ago

Attachment: syslog-57193.txt added

Syslog when it's working (hrev57193)

comment:1 by Starcrasher, 9 months ago

It seems that ipro1000 doesn't even fully init. These are same lines when it works. You can see two more lines about ipro1000 before start of package daemon stuff.

pci_reserve_device(0, 3, 0, ipro1000)
if_initname(0xffffffff897be000, em, 24)
[ipro1000] ipro1000: /dev/net/ipro1000/0
[ipro1000] (em) attach_pre capping queues at 1
[ipro1000] (em) bus_alloc_resource(3, [16], 0x0, 0xffffffffffffffff, 0x1,0x2)
set MTRRs to:
  mtrr:  0: base: 0x7ffe0000, size:    0x20000, type: 0
  mtrr:  1: base: 0xf8000000, size:  0x8000000, type: 0
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1
[ipro1000] (em) bus_alloc_resource(4, [20], 0x0, 0xffffffffffffffff, 0x1,0x2)
[ipro1000] (em) Using 1024 TX descriptors and 1024 RX descriptors
[ipro1000] (em) allocated for 1 tx_queues
[ipro1000] (em) allocated for 1 rx_queues
[ipro1000] (em) bus_alloc_resource(1, [0], 0x0, 0xffffffffffffffff, 0x1,0x6)
if_attach 0xffffffff898c2920
ipro1000: init_driver(0xffffffff81c39a70)
loaded driver /boot/system/add-ons/kernel/drivers/dev/net/ipro1000
package_daemon: [3323706:   315] 2023-08-06 15:54:25 KERN: latest volume state:
...

Will try on my USB key to see if it also happens on real hardware.

comment:2 by The_Ringmaster, 9 months ago

Can also confirm this is an issue for me as well.

comment:3 by waddlesplash, 9 months ago

I booted on VMware with ipro1000 on builds up through hrev57197 during testing, so I guess the GCC 13 upgrade is the most likely culprit to have broken something.

comment:4 by Starcrasher, 9 months ago

I updated my USB key to hrev57199 and all worked fine on real hardware but there ipro1000 is not in use. Both 32 bit and 64 bit installs were a bit slow to start but it's probably due to package installation scripts as I didn't update them for a little while.

Following your comment, I tested with hrev57197 nightly image in a VM and, indeed it was working at this stage.

comment:5 by waddlesplash, 9 months ago

Component: - GeneralDrivers/Network/ipro1000

So, I can't reproduce this problem when using ipro1000 in either VMware or QEMU.

What QEMU version are you running? What's the exact device specified in the QEMU command line? Can you try and get a KDL backtrace? Anything different in QEMU without KVM?

comment:6 by waddlesplash, 9 months ago

I tested on bare metal. ipro1000 works fine, downloaded a bunch of stuff.

comment:7 by Starcrasher, 9 months ago

QEMU emulator version 5.2.0 (qemu-5.2.0-4.mga8)

<interface type="network">

<mac address="52:54:00:5e:79:a4"/> <source network="Haiku-Net"/> <model type="e1000"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0"/>

</interface>

It doesn't crash, it just hanging forever without verbosing.

Well, the advantage is that with KVM you can use the virtual machine manager GUI https://virt-manager.org/ so there's no command line to deal with. The only thing that I had to set up manually was Shorewall config otherwise traffic is blocked and you can't access the net.

If it's a problem on my side then I will try to install a new VM with a recent nightly image. I have no important stuff on that VM. It allows me, from time to time, to try an app or to check the exact location of a sentence when I translate things on polyglot.

comment:8 by waddlesplash, 9 months ago

I'm running QEMU 7.0 here.

Can you test via the command line, and without KVM, and see if anything's different?

I may be able to glean more information by walking you through some kernel debugging steps, but that would have to happen over IRC/Matrix or something like that.

comment:9 by The_Ringmaster, 9 months ago

I'm running Haiku on a type 1 Hyper-v machine. I have screenshotted the backtrace below.

https://imgur.com/a/zx54O3C

comment:10 by waddlesplash, 9 months ago

This is clearly a different problem. Please capture a full syslog using serial out on the VM, open a new ticket, and attach it there.

comment:11 by pulkomandy, 9 months ago

Well, the advantage is that with KVM you can use the virtual machine manager GUI ​https://virt-manager.org/ so there's no command line to deal with.

That is convenient to you, but to reproduce the issue, we need the exact configuration used, and the simplest way to get that is a command line we can copy and paste.

It is likely that virt-manager is including some command line arguments that end up causing compatibility problems, whereas qemu default settings don't.

So, do you have a way to extract the qemu command line and share that?

comment:12 by pulkomandy, 9 months ago

Milestone: UnscheduledR1/beta5
Priority: normalhigh

comment:13 by johnsonjh, 9 months ago

I can confirm this happens here as well. 57197 is fine, 57199 is not.

When booting under QEMU with KVM enabled:

extra data[0]: 0x0000000000000001
extra data[1]: 0x700f66026e0f660f
extra data[2]: 0x31de4db70f44e0c8
extra data[3]: 0x0000000000000031
extra data[4]: 0x0000000000000000
extra data[5]: 0x0000000000000000
extra data[6]: 0x0000000000000000
extra data[7]: 0x0000000000000000
emulation failure 64888
RAX=0000000000000e00 RBX=ffffffff81c7c000 RCX=ffffffff81c7e598 RDX=ffffffff81ddbe00
RSI=ffffffff81c7b84c RDI=0000000000000032 RBP=ffffffff81c7b930 RSP=ffffffff81c7b900
R8 =0000000000000008 R9 =0000000000000140 R10=0000000000000008 R11=000000000000007d
R12=0000000000000000 R13=ffffffff82179738 R14=ffffffff82179738 R15=0000000000000030
RIP=ffffffff81d82afc RFL=00010286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0008 0000000000000000 ffffffff 00a09900 DPL=0 CS64 [--A]
SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 00007f73678f7000 ffffffff 00c00000
GS =0000 ffffffff82780400 ffffffff 00c00000
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =00f0 ffffffff801b5a00 00000068 00008b00 DPL=0 TSS64-busy
GDT=     ffffffff80210ce0 0000062f
IDT=     ffffffff8020fce0 00000fff
CR0=80010031 CR2=000000b25795afb0 CR3=0000000010444000 CR4=003406e0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=00 48 8b 0b 83 39 01 48 8b 51 08 0f 84 4b 01 00 00 48 01 c2 <66> 0f 6e 02 66 0f 70 c8 e0 44 0f b7 4d de 31 c0 31 f6 66 0f d6 8b 1c 01 00 00 bf ff 00 00

When booting without KVM, it starts up but doesn't get to the desktop.

The serial log shows that the ipro1000/em driver was the last thing initializing.

The version of QEMU doesn't matter. The behavior is exactly the same on:

  • QEMU emulator version 7.2.1
  • QEMU emulator version 7.2.5
  • QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)

This happens without any virt-manager involved. The command-line that triggers it every time is:

qemu-system-x86_64 -usbdevice tablet -smp 12 -m 2G \
  -net user,hostfwd=tcp::25723-:22 -net nic        \
  -drive file=haiku64.raw,format=raw -boot menu=on \
  -vga vmware -enable-kvm -cpu host,host-phys-bits

(Remove the last two flags to boot without KVM.)

I usually add -serial mon:stdio to capture the debug output from the serial, but it's not exciting. Everything seems normal.

Last edited 9 months ago by johnsonjh (previous) (diff)

comment:14 by johnsonjh, 9 months ago

I just noticed that virtio networking is supported.

I switched from the qemu default network interface (which is e1000 or e1000e) to virtio (-net nic,model=virtio-net-pci) and it works fine. So the problem is that driver, for sure.

Just in case you need it, using the e1000 NIC, here are the last parts from the debug log, right when the ipro1000 driver crashing, but I'm not sure if it is helpful.

pci_reserve_device(0, 3, 0, ipro1000)                                                                                                    
if_initname(0xffffffff889c9000, em, 16)                                                                                                  
[ipro1000] ipro1000: /dev/net/ipro1000/0                            
[ipro1000] (em) attach_pre capping queues at 2                                                                                           
[ipro1000] (em) bus_alloc_resource(3, [16], 0x0, 0xffffffffffffffff, 0x1,0x2)
set MTRRs to:                                                       
  mtrr:  0: base: 0x7ffe0000, size:    0x20000, type: 0             
  mtrr:  1: base: 0xfe000000, size:  0x2000000, type: 0     
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1             
[ipro1000] (em) EM_NVM_PCIE_CTRL = 0x460b                          
[ipro1000] (em) EEPROM V2.1-0                                                                                                            
[ipro1000] (em) Using 1024 TX descriptors and 1024 RX descriptors                                                                        
[ipro1000] (em) msix_init qsets capped at 2                                                                                              
[ipro1000] (em) bus_alloc_resource(3, [28], 0x0, 0xffffffffffffffff, 0x1,0x2)                                                            
set MTRRs to:                                                                                                                            
  mtrr:  0: base: 0x7ffe0000, size:    0x20000, type: 0                                                                                  
  mtrr:  1: base: 0xfe000000, size:  0x2000000, type: 0                                                                                  
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1
[ipro1000] (em) queue equality override not set, capping rx_queues at 1 and tx_queues at 1
[ipro1000] (em) Using 1 RX queues 1 TX queues                                                                                            
set MTRRs to:                    
  mtrr:  0: base: 0x7ffe0000, size:    0x20000, type: 0
  mtrr:  1: base: 0xfe000000, size:  0x2000000, type: 0
  mtrr:  2: base: 0x80000000, size: 0x80000000, type: 1
allocate_io_interrupt_vectors: allocated 2 vectors starting from 24
msi_allocate_vectors: allocated 2 vectors starting from 24
msix configured for 2 vectors
[ipro1000] (em) Using MSI-X interrupts with 2 vectors                                                                                    
[ipro1000] (em) allocated for 1 tx_queues                                                                                                
[ipro1000] (em) allocated for 1 rx_queues                                                                                                
[ipro1000] (em) bus_alloc_resource(1, [1], 0x0, 0xffffffffffffffff, 0x1,0x2)
msi-x enabled: 0x8004
[ipro1000] (em) bus_alloc_resource(1, [2], 0x0, 0xffffffffffffffff, 0x1,0x2)
msi-x enabled: 0x8004
if_attach 0xffffffff89954b20
KVM internal error. Suberror: 1
Last edited 9 months ago by johnsonjh (previous) (diff)

comment:15 by Starcrasher, 9 months ago

I switched from e1000 to rtl8139 and it also works. So it's definitely in ipro1000 init.

comment:16 by korli, 9 months ago

It might be a good idea to update to gcc 13.2 before anything else.

comment:17 by nielx, 9 months ago

There should be a nightly image built with GCC 13.2. Could you retest?

by Starcrasher, 9 months ago

Attachment: Config-xml-Qemu.txt added

XML config file of the VM in qemu GUI

by Starcrasher, 9 months ago

Attachment: Config-xml-Qemu-network.txt added

XML config file of the network in qemu GUI

comment:18 by Starcrasher, 9 months ago

After updating the VM to hrev57214, ipro1000 is still hanging right after if_attach message. I tried to install a new VM with same hrev nightly iso with same result.

comment:19 by waddlesplash, 9 months ago

I can't find anything online for "emulation failure 64888".

Is there any way we can get the faulting instruction from QEMU debugger?

comment:20 by waddlesplash, 9 months ago

Yes, it appears so: using the command x/16i <address> at the compat monitor. Can whoever can reproduce this please do that, using the address of RIP from the registers dump (e.g. as seen in comment:13) and paste the output here?

comment:21 by waddlesplash, 8 months ago

Blocking: 18593 added

comment:22 by waddlesplash, 8 months ago

Please retest after hrev57286.

comment:23 by Starcrasher, 8 months ago

Tested ok with hrev57287. Thanks

comment:24 by waddlesplash, 8 months ago

Resolution: fixed
Status: newclosed

comment:25 by waddlesplash, 8 months ago

Resolution: fixed
Status: closedreopened

comment:26 by waddlesplash, 8 months ago

Resolution: duplicate
Status: reopenedclosed

Closing as "duplicate" instead of "fixed" as the problem has merely been mitigated.

Note: See TracTickets for help on using tickets.