Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#2758 closed bug (fixed)

downloading on VIA Rhine-II KDLs

Reported by: Adek336 Owned by: axeld
Priority: normal Milestone: R1
Component: Network & Internet Version: R1/pre-alpha1
Keywords: Cc: imker@…
Blocked By: Blocking:
Platform: All

Description

Downloading a file with wget, browsing webpages with Links or starting Firefox crashes system. The fastest in crashing is Firefox - just after rendering the home page.

one trace

PANIC: vm_page_fault
...
<..kernel_x86>memcpy_generic
<..>devfs_read__FP9fs_volumeP8fs_vnodePvcT2PUl
<..>file_read__FP15file_descriptorxPvPUl
<..>_kern_read
<..>read
<..add-ons/kernel/network/devices/ethernet>ethernet_receive_data__FP10net_devicePP10net_buffer
<..add-ons/kernel/network/stack>device_reader_thread__FPv
<..kernel_x86>_create_kernel_thread_kentry__Fv
<..>thread_kthreadexit__Fv

another

PANIC: vm_page_fault
...
<..kernel_x86>memmove
</boot/..../drivers/dev/net/via_rhine>m_devget
<..>vr_rxeof
<..>vr_intr
<..>intr_handler

Change History (8)

comment:1 by Adek336, 16 years ago

With the following in src/libs/compat/freebsd_network/fbsd_mbuf.c

+ #define CHECK_RECQ(x) do { CheckReceiveQueue(x, __func__, __LINE__); } while(0)
+ void CheckReceiveQueue(struct ifnet *ifp, char *c, int l)
+ {
+    struct ifqueue *ifq= &ifp->receive_queue;
+    struct mbuf *m;
+ 
+    dprintf("CheckReceiveQueue: %s:%d, m=%p, ifp=%p, ifq=%p\n", c,l, ifq->ifq_head,ifp,ifq);
+ 
+    for (m=ifq->ifq_head; m != NULL; m = m->m_nextpkt)
+    {
+       if (mtod(m,unsigned int) < 0x01000000)
+       {
+          dprintf("CHECK_RECQ: corrupt: %s:%d: m=%p, m->m_data=%p, m->m_nexpkt=%p\n", c,l,m, m->m_data, m->m_nextpkt);
+          panic("CHECK_RECQ: corrupt: %s:%d: m=%p, m->m_data=%p, m->m_nexpkt=%p\n", c,l,m, m->m_data, m->m_nextpkt);
+       }
+    }
+ }

...

struct mbuf *
m_devget(char *buf, int totlen, int off, struct ifnet *ifp,
         void (*copy)(char *from, caddr_t to, u_int len))
{
...
+        CHECK_RECQ(ifp);
+        dprintf("m_devget: off=%d, buf=%p, mtod(m, void*)=%p, m=%p, len=%d, copy=%p\n", off, buf, mtod(m,void*),m,len,copy);
         if (copy)
                copy(buf, mtod(m,caddr_t), (u_int)len);
         else
                bcopy(buf, mtod(m,caddr_t), (u_int)len);
+        CHECK_RECQ(ifp);
...
}

I got in syslog

...
KERN: m_devget: off=0, buf=0x939d3000, mtod(m, void*)=0x939fc8b6, m=0x939fc888, len=223, copy=0x00000000
KERN: CheckReceiveQueue: m_devget:480, m=0x939fcc88, ifp=0x91585000, ifq=0x91585280
KERN: CHECK_RECQ: corrupt: m_devget:480: m=0x939fc988, m->m_data=0x00670c5b, m->m_nextpkt=0x1e10532f
...

It passes the first check, performs bcopy() and then fails on the second check on the basis that m_data points very low in memory. The first line informs us that m_devget copies 223 bytes to 0x939fc8b6 which is an error: it means overwriting the mbuf mentioned in the third line of the syslog excerpt. Interestingly, most of the time the corrupt mbuf has m->m_data=0x00670c5b.

It sometimes KDLs elsewhere but this one happens most of the time. I leave only one CPU active when testing to have debug data from different threads a bit less interleaved.

comment:2 by Adek336, 16 years ago

I found the cause !

fbsd_mbuf.c

struct mbuf*
m_devget(char *buf, int totlen, int off, struct ifnet *ifp,
         void (*copy)(char *from, caddr_t to, u_int len))
{
...
      if (top == NULL) {
-         if (totlen + off >= MINCLSIZE) {
+         if (true) {
               m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);          
...
}

That is a workaround. I'm actually writing this in Haiku :D however the mouse sometimes freezes for a 1-2 second period.

comment:3 by Adek336, 16 years ago

  1. Haiku with the above workaround still crashes, not after 30 secs but after 30 mins, perhaps because it can still allocate an mbuf with m_gethdr in the top != NULL branch.
  1. The problem is: m_devget writes more bytes than it should, overwriting other mbufs. Using m_getcl instead of m_gethdr is a workaround as it allocates a larger buffer.
  1. sizeof(struct mbuf) should be MSIZE = 0x100, but it is 0x118
  1. In Haiku:

compat/sys/mbuf.h

#define MLEN            ((int)(MSIZE - sizeof(struct m_hdr)))
#define MHLEN           ((int)(MSIZE - sizeof(struct pkthdr)))

In FreeBSD:

#define MLEN            ((int)(MSIZE - sizeof(struct m_hdr)))
#define MHLEN           ((int)(MLEN - sizeof(struct pkthdr)))
  1. Adopting the FreeBSD's definition of MHLEN (instead of the above workaround) makes the KDL go away, at least so far.

comment:4 by siarzhuk, 16 years ago

Cc: imker@… added

comment:5 by siarzhuk, 16 years ago

To Adek336: Just for your information: this patch for MHLEN helped me to resolve another problem with mbuf: http://dev.haiku-os.org/ticket/1641#comment:18 Thank you!

comment:6 by Adek336, 16 years ago

I'm glad to hear that!

comment:7 by axeld, 16 years ago

Resolution: fixed
Status: newclosed

Thanks again for your great investigation! :-) I've applied the change in hrev27771.

comment:8 by stippi, 16 years ago

Awesome work, Adek336! Those drivers, especially RTL8139, are sure important! Thanks a lot!

Note: See TracTickets for help on using tickets.