Opened 11 years ago

Closed 22 months ago

#9166 closed bug (not reproducible)

Packet loss on rtl81xx ethernet

Reported by: kallisti5 Owned by: nobody
Priority: normal Milestone: R1
Component: Drivers/Network/rtl81xx Version: R1/alpha4.1
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Seeing severe packet loss on rtl81xx ethernet cards.

Ping to gateway is low (~0.7ms) but unreliable (49% packet loss.. 90 received of 178 packets transmitted)

This may be related to the other 'network performance issues' found on some cards.

haikuporter downloading a 5M file will take 6 hours through wget.

Moving the mouse on Haiku has no effect. (i've seen it speed up network performance issues in the past)

Attachments (2)

syslog-noapic (337.7 KB ) - added by kallisti5 11 years ago.
syslog no apic. up for several hours, no real network problems.
packetloss-hrev46796.txt (11.8 KB ) - added by kallisti5 10 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 by kallisti5, 11 years ago

10134778 unhandled apic_timer_interrupt int's as per kdl. I'm guessing that is part of the problem... trying machine with apic disabled.

comment:2 by kallisti5, 11 years ago

I've disabled IO-APIC on boot and the issue has completely disappeared. So it definitely looks like some kind of interrupt routing issue.

in reply to:  1 comment:3 by mmlr, 11 years ago

Replying to kallisti5:

10134778 unhandled apic_timer_interrupt int's as per kdl. I'm guessing that is part of the problem... trying machine with apic disabled.

As explained in IRC the APIC timer is part of the local APIC, not of the IO-APIC so this doesn't corelate. An unhandled count roughly the same as the hamdled count is expected and unproblematic.

Disabling the IO-APIC means the interrupt routing is pretty much fully different. Please attach a syslog of a boot with IO-APIC enabled, from that the intrrrupt routing can be deduced for both cases.

Generally the IO-APIC allows for more interrupt vectors to be used so it usually reduces interrupt sharing. It is still up to the system implementor to wire things up though, so in your case there might be shared interrupts in that case.

I think the rtl driver supports MSIs as well, so the routing may be moot. Again a syslog would tell.

by kallisti5, 11 years ago

Attachment: syslog-noapic added

syslog no apic. up for several hours, no real network problems.

comment:4 by kallisti5, 11 years ago

actually.. the last syslog had both situations in it.

1st boot... apic enabled... slow network last boot... apic disabled.. no problems after 2 hours of use

comment:5 by mmlr, 11 years ago

Please use accurate names. You are disabling the IO-APIC, not the APIC. Just APIC usually means local APIC, which is a completely different, CPU local, device in your system.

While reviewing your interrupt routing configuration it becomes clear that the routing from not using the IO-APIC to using it changes for your network card. That would make it share its interrupt vector with different devices in both cases. However, your device isn't using legacy interrupts at all, in neither case. Instead it uses MSIs in both cases, which makes it a perfectly fine, non-shared interrupt. So from the driver and device side nothing (apart from the vector number) changes for your network card when switching between using and not using the IO-APIC. It'd have to be some other interaction causing the problem.

Is it possible that this is actually random and just happened to be in this pattern? Or maybe it always happens after a warm reboot? Can you try cold booting into the system with the IO-APIC enabled and disabled?

comment:7 by kallisti5, 10 years ago

Still an issue as of hrev46796. Attaching a log of the packet loss and me trying to "catch" an issue in the kdl at the time the packet loss starts.

Packet loss seems to start after X hours of uptime, on a reboot things begin working again with 0% packet loss.

by kallisti5, 10 years ago

Attachment: packetloss-hrev46796.txt added

comment:8 by waddlesplash, 4 years ago

Still a problem?

comment:9 by kallisti5, 22 months ago

Resolution: not reproducible
Status: newclosed

This hardware is long gone. Here are a few better identifiers from the syslogs to make searching for future bug reporters easier.

  • Realtek RTL8101E RTL8102E
  • PCIID: 10ec:8136
Note: See TracTickets for help on using tickets.