Opened 17 years ago
Closed 16 years ago
#1641 closed bug (fixed)
KDL: rtl8139
Reported by: | kaoutsis | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Drivers/Network | Version: | R1/pre-alpha1 |
Keywords: | Cc: | scottmc, idefix, zharik@…, HubertNG@… | |
Blocked By: | Blocking: | #1661, #1890, #2452 | |
Platform: | x86 |
Description
surfing with opera, after 5 minutes, came to this stace trace: rtl8139-new-bug.txt (attached)
Attachments (6)
Change History (31)
by , 17 years ago
Attachment: | rtl8139-new-bug.txt added |
---|
comment:2 by , 17 years ago
Cc: | added; removed |
---|
by , 16 years ago
Attachment: | 8139-kdl.txt added |
---|
This was with hrev25673, on an AMD Geode based PC board.
comment:3 by , 16 years ago
Cc: | added |
---|
comment:4 by , 16 years ago
I got the same KDL with my rtl8139 just last night, with Firefox, sorry, Bon Echo. I'm running hrev25860.
Something is wrong in BSD net drivers compatibility layer, I guess.
comment:5 by , 16 years ago
Cc: | added |
---|
comment:6 by , 16 years ago
Blocking: | 2452 added |
---|
comment:7 by , 16 years ago
Component: | - General → Drivers/Network |
---|
Same problem here, for me it could crash at boot or at firefox start or a few moments later.
As rtl8139 is so common nic these days I would recommend to fix it until alpha.
comment:8 by , 16 years ago
My system (hrev26909 image) also crashes when I send a lot of data from another computer from the network to Haiku's ftp service: cat /dev/zero | nc 192.168.1.199 21 (192.168.1.199 is the Haiku machine). Ping -f and also pings with big packet sizes are no problem.
by , 16 years ago
Attachment: | 8139-flooding-port-21-with-zeros-kdl.txt added |
---|
backtrace when crashing on flooding port 21 with zeros
comment:12 by , 16 years ago
Cc: | added |
---|
comment:13 by , 16 years ago
2 Axel: I observe the same crash ("m_free + 0x0017") frequently on my system with rtl8139. It is easily reproducible by starting Firefox. ;-)
Looks like the very first access to m_free parameter fails:
180 struct mbuf * 181 m_free(struct mbuf *m) 182 { 183 struct mbuf *next = m->m_next; 184 185 if (m->m_flags & M_EXT) 186 mb_free_ext(m); 187 else 188 object_cache_free(sMBufCache, m); 189 190 return next; 191 }
I have checked this with disassembly log. The asm commands that fail are
0x0000df84 push %ebp; m_free code start here ... 0x0000df98 mov 0x8(%ebp), %eax 0x0000df9b mov (%eax), %esi ; <--- KDL! 0x0000df9d testb $0x1, 0x10(%eax) ...
May be you have any suggestions before I try to dig into debugging this problem? :-) Looks like m->next become invalid at some time - and m_freem cannot call it with null pointer.
comment:14 by , 16 years ago
Sorry. :-( corrected code blocks:
180 struct mbuf * 181 m_free(struct mbuf *m) 182 { 183 struct mbuf *next = m->m_next; 184 185 if (m->m_flags & M_EXT) 186 mb_free_ext(m); 187 else 188 object_cache_free(sMBufCache, m); 189 190 return next; 191 }
disasm:
0x0000df84 push %ebp; m_free code start here ... 0x0000df98 mov 0x8(%ebp), %eax 0x0000df9b mov (%eax), %esi ; <--- KDL! 0x0000df9d testb $0x1, 0x10(%eax) ...
comment:17 by , 16 years ago
You could add ktrace_printf() output to the m_* functions, as well as to compat_read(), and then see (don't forget to a) enable tracing in tracing_config.h, and b) enlarge the tracing buffer) via KDL "traced" what exactly happened.
comment:18 by , 16 years ago
During my "traced" games I observed 4 cases of network-related KDLs on my system with rtl8139:
1) page fault in m_free call from compat_read. It looks like one that is traced in attachment 8139-kdl.txt.
2) page fault in m_free call from m_defrag. It is mentioned above in attachment 8139-wget-kdl.txt
3) page fault in memcpy_generic call from devfs_read in "/dev/net/rtl8139 reader" thread.
4) page fault in CompareC24ConnectionHashDefinitionRCt4pair2ZPC8sockaddrZPC8sockaddrP11TCPEndpoint. It is already submitted as ticket #2706.
First of all I have investigated the "case 1" because it was observed very frequently on my system. This problem occure as follows: in interrupt handler the rl_rxeof create mbuf for recieved data by call of m_devget. Right after this in the same call of interrupt handler rl_rxeof create another mbuf with next packet of received data by calling m_devget again. After the interrupt handler is finished the compat_read copy received data and attempt to free the mbuf created by first call of m_devget. This attempt failed because m_next of this mbuf is invalid (traced says that it almost always has value of 0x00000d36)
The "case 2" was observed rarely and looks like related the same problem as "case 1" but during rl_txeof handling.
About the "case 3" and "case 4" I thought it is not related to mbuf problem.
During browsing Trac tickets for something related to this problems I found ticket #2758 that describe some problem in m_devget.
I have tried mentioned in that ticked fix from Adek336 in compat/sys/mbuf.h
#define MLEN ((int)(MSIZE - sizeof(struct m_hdr))) -#define MHLEN ((int)(MSIZE - sizeof(struct pkthdr))) +#define MHLEN ((int)(MLEN - sizeof(struct pkthdr)))
Now I cannot observe "Cases 1,2,3" for about of 1 hour of stress testing.
Unfortunately the "case 4" (ticket #2706) is still reproducible on my system.
comment:19 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
follow-up: 21 comment:20 by , 16 years ago
Cc: | added |
---|---|
Platform: | All → x86 |
Resolution: | fixed |
Status: | closed → reopened |
I reproduced this bug in hybrid gcc4 28810 on FF 2.0.0.12
follow-up: 22 comment:21 by , 16 years ago
follow-up: 23 comment:22 by , 16 years ago
Replying to siarzhuk: I don't know, but this is bug "0 reader" too, not only "vm_page_fault: unhandled page fault in kernel space at..."
comment:23 by , 16 years ago
Replying to siarzhuk:
Upss, sorry, you have right, this "0 consumer" error, my big mistake, sorry one more...
comment:25 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
stack trace for the rtl8139