Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#2758 closed bug (fixed)

downloading on VIA Rhine-II KDLs

Reported by: Adek336 Owned by: axeld
Priority: normal Milestone: R1
Component: Network & Internet Version: R1/pre-alpha1
Keywords: Cc: imker@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Downloading a file with wget, browsing webpages with Links or starting Firefox crashes system. The fastest in crashing is Firefox - just after rendering the home page.

one trace

PANIC: vm_page_fault
...
<..kernel_x86>memcpy_generic
<..>devfs_read__FP9fs_volumeP8fs_vnodePvcT2PUl
<..>file_read__FP15file_descriptorxPvPUl
<..>_kern_read
<..>read
<..add-ons/kernel/network/devices/ethernet>ethernet_receive_data__FP10net_devicePP10net_buffer
<..add-ons/kernel/network/stack>device_reader_thread__FPv
<..kernel_x86>_create_kernel_thread_kentry__Fv
<..>thread_kthreadexit__Fv

another

PANIC: vm_page_fault
...
<..kernel_x86>memmove
</boot/..../drivers/dev/net/via_rhine>m_devget
<..>vr_rxeof
<..>vr_intr
<..>intr_handler

Change History (8)

comment:1 Changed 11 years ago by Adek336

With the following in src/libs/compat/freebsd_network/fbsd_mbuf.c

+ #define CHECK_RECQ(x) do { CheckReceiveQueue(x, __func__, __LINE__); } while(0)
+ void CheckReceiveQueue(struct ifnet *ifp, char *c, int l)
+ {
+    struct ifqueue *ifq= &ifp->receive_queue;
+    struct mbuf *m;
+ 
+    dprintf("CheckReceiveQueue: %s:%d, m=%p, ifp=%p, ifq=%p\n", c,l, ifq->ifq_head,ifp,ifq);
+ 
+    for (m=ifq->ifq_head; m != NULL; m = m->m_nextpkt)
+    {
+       if (mtod(m,unsigned int) < 0x01000000)
+       {
+          dprintf("CHECK_RECQ: corrupt: %s:%d: m=%p, m->m_data=%p, m->m_nexpkt=%p\n", c,l,m, m->m_data, m->m_nextpkt);
+          panic("CHECK_RECQ: corrupt: %s:%d: m=%p, m->m_data=%p, m->m_nexpkt=%p\n", c,l,m, m->m_data, m->m_nextpkt);
+       }
+    }
+ }

...

struct mbuf *
m_devget(char *buf, int totlen, int off, struct ifnet *ifp,
         void (*copy)(char *from, caddr_t to, u_int len))
{
...
+        CHECK_RECQ(ifp);
+        dprintf("m_devget: off=%d, buf=%p, mtod(m, void*)=%p, m=%p, len=%d, copy=%p\n", off, buf, mtod(m,void*),m,len,copy);
         if (copy)
                copy(buf, mtod(m,caddr_t), (u_int)len);
         else
                bcopy(buf, mtod(m,caddr_t), (u_int)len);
+        CHECK_RECQ(ifp);
...
}

I got in syslog

...
KERN: m_devget: off=0, buf=0x939d3000, mtod(m, void*)=0x939fc8b6, m=0x939fc888, len=223, copy=0x00000000
KERN: CheckReceiveQueue: m_devget:480, m=0x939fcc88, ifp=0x91585000, ifq=0x91585280
KERN: CHECK_RECQ: corrupt: m_devget:480: m=0x939fc988, m->m_data=0x00670c5b, m->m_nextpkt=0x1e10532f
...

It passes the first check, performs bcopy() and then fails on the second check on the basis that m_data points very low in memory. The first line informs us that m_devget copies 223 bytes to 0x939fc8b6 which is an error: it means overwriting the mbuf mentioned in the third line of the syslog excerpt. Interestingly, most of the time the corrupt mbuf has m->m_data=0x00670c5b.

It sometimes KDLs elsewhere but this one happens most of the time. I leave only one CPU active when testing to have debug data from different threads a bit less interleaved.

comment:2 Changed 11 years ago by Adek336

I found the cause !

fbsd_mbuf.c

struct mbuf*
m_devget(char *buf, int totlen, int off, struct ifnet *ifp,
         void (*copy)(char *from, caddr_t to, u_int len))
{
...
      if (top == NULL) {
-         if (totlen + off >= MINCLSIZE) {
+         if (true) {
               m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);          
...
}

That is a workaround. I'm actually writing this in Haiku :D however the mouse sometimes freezes for a 1-2 second period.

comment:3 Changed 11 years ago by Adek336

  1. Haiku with the above workaround still crashes, not after 30 secs but after 30 mins, perhaps because it can still allocate an mbuf with m_gethdr in the top != NULL branch.
  1. The problem is: m_devget writes more bytes than it should, overwriting other mbufs. Using m_getcl instead of m_gethdr is a workaround as it allocates a larger buffer.
  1. sizeof(struct mbuf) should be MSIZE = 0x100, but it is 0x118
  1. In Haiku:

compat/sys/mbuf.h

#define MLEN            ((int)(MSIZE - sizeof(struct m_hdr)))
#define MHLEN           ((int)(MSIZE - sizeof(struct pkthdr)))

In FreeBSD:

#define MLEN            ((int)(MSIZE - sizeof(struct m_hdr)))
#define MHLEN           ((int)(MLEN - sizeof(struct pkthdr)))
  1. Adopting the FreeBSD's definition of MHLEN (instead of the above workaround) makes the KDL go away, at least so far.

comment:4 Changed 11 years ago by siarzhuk

Cc: imker@… added

comment:5 Changed 11 years ago by siarzhuk

To Adek336: Just for your information: this patch for MHLEN helped me to resolve another problem with mbuf: http://dev.haiku-os.org/ticket/1641#comment:18 Thank you!

comment:6 Changed 11 years ago by Adek336

I'm glad to hear that!

comment:7 Changed 11 years ago by axeld

Resolution: fixed
Status: newclosed

Thanks again for your great investigation! :-) I've applied the change in hrev27771.

comment:8 Changed 11 years ago by stippi

Awesome work, Adek336! Those drivers, especially RTL8139, are sure important! Thanks a lot!

Note: See TracTickets for help on using tickets.