Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#2008 closed bug (fixed)

KDL: when running 'svn up' on haiku-host (r24742)

Reported by: kaoutsis Owned by: axeld
Priority: blocker Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: anevilyak
Blocked By: Blocking:
Platform: All

Description

vm_soft_fault: va 0xdeadbeef not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xdeadbeef, ip 0xdeadbeef, write 0, user 0, thread 0x9
CPU 1 halted!
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbeef, ip 0xdeadbeef

Welcome to Kernel Debugging Land...
Running on CPU 0
kdebug> where
stack trace for thread 9 "block writer"
    kernel stack: 0x80183000 to 0x80187000
frame            caller     <image>:function + offset
80186aa8 (+  52) 8008523b   <kernel>:invoke_debugger_command + 0x00cf
80186adc (+  64) 80085fdc   <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8
80186b1c (+  48) 800859ce   <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de
80186b4c (+ 228) 800870f0   <kernel>:evaluate_debug_command + 0x0088
80186c30 (+  64) 80083d7e   <kernel>:kernel_debugger_loop__Fv + 0x017a
80186c70 (+  48) 80084a29   <kernel>:kernel_debugger + 0x010d
80186ca0 (+ 192) 80084911   <kernel>:panic + 0x0029
80186d60 (+  64) 8005d8cb   <kernel>:vm_page_fault + 0x00ab
80186da0 (+  64) 80091909   <kernel>:page_fault_exception + 0x00b1
80186de0 (+  12) 8009501d   <kernel>:int_bottom + 0x001d (nearest)
iframe at 0x80186dec (end = 0x80186e44)
 eax 0xdeadbeef     ebx 0x0             ecx 0x1          edx 0x91101980
 esi 0x91101980     edi 0x910d5c20      ebp 0x80186e78   esp 0x80186e20
 eip 0xdeadbeef  eflags 0x10283
 vector: 0xe, error code: 0x0
80186dec (+ 140) deadbeef
80186e78 (+ 352) 800667c8   <kernel>:block_writer__FPv + 0x0138
80186fd8 (+  32) 80038663   <kernel>:_create_kernel_thread_kentry__Fv + 0x001b
80186ff8 (+2145882120) 800385f8   <kernel>:thread_kthread_exit__Fv + 0x0000
kdebug>
  • get the subversion pkg
  • run 'svn up'
  • after a few seconds the kdl will come up

i reproduce the kdl every time, if i skip the 'svn up' and do a 'jam -q haiku-image' on haiku-host after a few minutes i get the same kdl

this happen only after this revision. Two, three days ago, i had no problem with svn up and the 'jam -q haiku-image' never showed kdl

Attachments (4)

block_writer_kdl_r24742.txt (44.8 KB ) - added by kaoutsis 16 years ago.
block writer.jpg (478.8 KB ) - added by ddew 16 years ago.
Photo of the backtrace
kdl.jpg (68.4 KB ) - added by bga 16 years ago.
Clearer KDL image.
DSCN1845.jpg (356.5 KB ) - added by anevilyak 16 years ago.
KDL - hrev24788

Download all attachments as: .zip

Change History (33)

by kaoutsis, 16 years ago

Attachment: block_writer_kdl_r24742.txt added

comment:1 by axeld, 16 years ago

Resolution: fixed
Status: newclosed

Thanks! Fixed in hrev24749.

comment:2 by axeld, 16 years ago

Component: - GeneralSystem/Kernel
Priority: normalblocker

by ddew, 16 years ago

Attachment: block writer.jpg added

Photo of the backtrace

comment:3 by ddew, 16 years ago

Resolution: fixed
Status: closedreopened

I run into the same issue on hrev24750, attached a photo of the backtrace.

comment:4 by bga, 16 years ago

I can confirm that the bug still exists as I just got sent to KDL when compiling Haiku. I will attach a clearer kdl image but it looks exactly like the one posted by ddew.

by bga, 16 years ago

Attachment: kdl.jpg added

Clearer KDL image.

comment:5 by diver, 16 years ago

I have the same issue with hrev24750, for me it kdl'ed when i launched MediaPlayer.

comment:6 by anevilyak, 16 years ago

Cc: anevilyak added

comment:7 by anevilyak, 16 years ago

I'm seeing the same issues on my athlon64, albeit I can't even make it through the boot process without hitting it.

comment:8 by axeld, 16 years ago

Resolution: fixed
Status: reopenedclosed

Fixed in hrev24761, block writer notification mechanism completely rewritten in hrev24768.

comment:9 by kaoutsis, 16 years ago

i tested again after hrev24742 now i get this:

where
stack trace for thread 9 "block writer/notifier"
    kernel stack: 0x80188000 to 0x8018c000
frame            caller     <image>:function + offset
8018ba58 (+  52) 80085d17   <kernel>:invoke_debugger_command + 0x00cf
8018ba8c (+  64) 80086ab8   <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8
8018bacc (+  48) 800864aa   <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de
8018bafc (+ 228) 80087bcc   <kernel>:evaluate_debug_command + 0x0088
8018bbe0 (+  64) 8008485a   <kernel>:kernel_debugger_loop__Fv + 0x017a
8018bc20 (+  48) 80085505   <kernel>:kernel_debugger + 0x010d
8018bc50 (+ 192) 800853ed   <kernel>:panic + 0x0029
8018bd10 (+  64) 8005d9b7   <kernel>:vm_page_fault + 0x00ab
8018bd50 (+  64) 800923e5   <kernel>:page_fault_exception + 0x00b1
8018bd90 (+  12) 80095aed   <kernel>:int_bottom + 0x001d (nearest)
iframe at 0x8018bd9c (end = 0x8018bdf4)
 eax 0xdeadbeef     ebx 0x9145e240      ecx 0x1          edx 0x246
 esi 0x9145e260     edi 0x8018be40      ebp 0x8018be48   esp 0x8018bdd0
 eip 0xdeadbeef  eflags 0x10287
 vector: 0xe, error code: 0x0
8018bd9c (+ 172) deadbeef
8018be48 (+  64) 80066f7d   <kernel>:flush_pending_notifications__Fv + 0x0069
8018be88 (+ 336) 80066ff9   <kernel>:block_notifier_and_writer__FPv + 0x0055
8018bfd8 (+  32) 800386e3   <kernel>:_create_kernel_thread_kentry__Fv + 0x001b
8018bff8 (+2145861640) 80038678   <kernel>:thread_kthread_exit__Fv + 0x0000
kdebug>

comment:10 by kaoutsis, 16 years ago

kdebug> thread 9
THREAD: 0x90a69800
id:                 9 (0x9)
name:               "block writer/notifier"
all_next:           0x00000000
team_next:          0x90a5f000
q_next:             0x90a5b800
priority:           5 (next 5)
state:              running
next_state:         ready
cpu:                0x800f6ab4 (1)
sig_pending:        0x0 (blocked: 0x0)
in_kernel:          1
  sem.blocking:     39398
  sem.count:        0
  sem.acquire_status: 0x0
  sem.flags:        0x0
condition variables:
fault_handler:      0x80085d2c
args:               0x00000000 0x00000000
entry:              0x80066fa4
team:               0x90a36780, "kernel_team"
  exit.sem:         183
  exit.status:      0x0 (No error)
  exit.reason:      0x0
  exit.signal:      0x0
  exit.waiters:
kernel_stack_area:  84
kernel_stack_base:  0x80188000
user_stack_area:    -1
user_stack_base:    0x00000000
user_local_storage: 0x00000000
kernel_errno:       0x0 (No error)
kernel_time:        716567
user_time:          0
flags:              0x0
architecture dependant section:
        esp: 0x8018bb38
        ss: 0x80180010
        fpu_state at 0x90a69b60
kdebug>

comment:11 by axeld, 16 years ago

Resolution: fixed
Status: closedreopened

Great, looks like I can't do anything right :-) Can you please add the KDL message? (you can always later retrieve it in KDL with the "message" command) How did you produce this crash? What did you do? SMP?

comment:12 by bga, 16 years ago

Just FYI, I didn't get any errors again and I compiled Haiku inside itself from start to finish. Other than some mimeset bugs I am seeing, everything worked.

in reply to:  11 comment:13 by kaoutsis, 16 years ago

Replying to axeld:

Great, looks like I can't do anything right :-)

Of course, you can, with a small extra help :-)

Can you please add the KDL message? (you can always later retrieve it in KDL with the "message" command)

i just reboot before i read your message, the big problem now is that i 'lost' the partition:

bfs: Replay log, disk was not correctly unmounted...
run count: -1, array max: -2, max runs: 126
bfs: Log entry has broken header!
bfs: replaying log entry from 820 failed: Bad data
bfs: could not initialize journal: General system error!
bfs: bfs_mount:120: General system error

that partition had all the source tree, and was initialized from hrev5 Should i try to mount from hrev5? any ideas? no worries, i love this sport, and i am prepared for these kind of emotions :-)

How did you produce this crash? What did you do?

'svn up' now we will have a test of a full 'svn up', i guess :-)

SMP?

the hyperthreading cpu (P4 3Ghz) HT

May i ask what kinds of information shall be traced (enabled) in tracing_config.h if any?

by anevilyak, 16 years ago

Attachment: DSCN1845.jpg added

KDL - hrev24788

comment:14 by anevilyak, 16 years ago

I just saw this crash on hrev24788 after doing a complete svn checkout. See attached screenshot. This was on a single CPU Athlon64 2GHz w/ 1GB of RAM.

comment:15 by axeld, 16 years ago

Resolution: fixed
Status: reopenedclosed

Should finally be fixed again in hrev24792.

comment:16 by bga, 16 years ago

There seems to be a deadlock now. Unzipping Haiku sources inside Haiku reached a point where it simply stopped. I can still move the mouse around but I can not do any disk operations. I can not also stop the unzip process with CTRL+C. Should I open a new bug for this?

comment:17 by axeld, 16 years ago

Has it been fixed with hrev24793 by any chance? Have you seen where the threads were waiting? If it just sit in wait_for_notifications() it should have been that bug.

comment:18 by bga, 16 years ago

Yep. That did it. bug finally fixed (until further notice ;) ).

comment:19 by stippi, 16 years ago

Unfortunately, I am still getting i/o system lock ups with 24793. I was using BePDF when any app suddenly froze that was trying to read/write from disk.

comment:20 by anevilyak, 16 years ago

Resolution: fixed
Status: closedreopened

I experienced this as well on shutdown..."Tidying things up" hung forever, and the dialog was uncancellable.

comment:21 by anevilyak, 16 years ago

Further note, I just encountered it again while doing an svn checkout, and checked the threads involved...

The svn thread is blocked on the "bfs journal" sem, with a count of -4, while the page writer is blocked on a sem called page writer with a count of -1. Is there any easy way to list all the threads blocking on a particular sem without looking through the entire thread list?

comment:22 by anevilyak, 16 years ago

A little more info I dug up: The low memory handler thread is likewise blocked on the bfs journal sem, and the block notifier is waiting on the "cache transaction sync" cvar, along with 8 other threads if I understand the kernel debugger's output about the var properly.

in reply to:  21 comment:23 by bonefish, 16 years ago

Replying to anevilyak:

Is there any easy way to list all the threads blocking on a particular sem without looking through the entire thread list?

"sem" prints the queue of waiting threads under "queue:".

comment:24 by anevilyak, 16 years ago

Glad I left that debugger open in that case :)

id: 991 (0x3df) name: 'bfs journal' owner: 1 count: -4 queue: 276 6 181 170 last acquired by: 8, count: 1 last released by 276, count: 1

8 = block writer/notifier, waiting on the cache transaction sync cv 276 = svn 6 = low memory handler 181 = TrackerTaskLoop 170 = Terminal window thread

in reply to:  24 comment:25 by bonefish, 16 years ago

Replying to anevilyak:

Glad I left that debugger open in that case :)

I haven't looked at Axel's latest changes to the block cache, but if you still have the debugger open, stack traces for the 5 potentially involved threads (8, 276, 6, 181, 170) should help to understand the deadlock.

comment:26 by anevilyak, 16 years ago

I'm leaving that box in the kernel debugger until I'm asked to try a change :) I can't get anything useful done in Haiku because of that deadlock anyways, so I may as well leave it so I can pull as much useful information about that state as possible. In any case, traces:

thread 8 - <kernel>:context_switchFP6threadT0 <kernel>:scheduler_reschedule <kernel>:Wait29PrivateConditionVariableEntryUl <kernel>:wait_for_notificationsFP11block_cache <kernel>:cache_sync_transaction <bfs>:_WriteTransactionToLog7Journal <bfs>:_FlushLog7JournalbT1 <bfs>:_TransactionListener7JournalllPv <kernel>:flush_pending_notificationsFP11block_cache <kernel>:flush_pending_notificationsFv <kernel>:block_notifier_and_writerFPv <kernel>:_create_kernel_thread_kentry_Fv <kernel>:thread_kthread_exit_Fv

thread 276: <kernel>context_switchFP6ThreadT0 <kernel>:scheduler_reschedule <kernel>:switch_sem_etc <kernel>:acquire_sem_etc <bfs>:Lock7JournalP11Transaction <bfs>:Start11TransactionP6Volumex <bfs>:WriteAt5InodeR11TransactionxPCUcPUl <bfs>:bfs_writeFPvN20xPCvPUl <kernel>:file_writeFP15file_descriptorxPCvPUl <kernel>:common_user_ioFixPvUlb <kernel>:_user_write <kernel>:pre_syscall_debug_done iframe at 0x94b03fa8 (end = 0x94b04000) eax 0x70 ebx 0x460074 ecx 0x7ffe66dc edx 0xffff0104 esi 0x1a48e3b0 edi 0x19f022d8 ebp 0x7ffe6718 esp 0x94b03fdc eip 0xffff0104 eflags 0x203 vector: 0x63, error code: 0x0 <libapr-1.so>:apr_file_write <snip>lots of layers in APR/LibSVN that I assume is irrelevant here.

thread 6: <kernel>:context_switchFP6threadT0 <kernel>:scheduler_reschedule <kernel>:switch_sem_etc <kernel>:acquire_sem_etc <bfs>:Lock7JournalP11Transaction <bfs>:Start11TransactionP6Volumex <bfs>:bfs_put_vnodeFPvT0b <kernel>:free_vnodeFP5vnodeb <kernel>:dec_vnode_ref_countFP5vnodeb <kernel>:vnode_low_memoy_handlerFPvl <kernel>:call_handlersFl <kernel>:low_memoryFPv <kernel>:_create_kernel_thread_kentryFv <kernel>:thread_kthread_exitFv

thread 181: <kernel>:context_switchFP6threadT0 <kernel>:scheduler_reschedule <kernel>:switch_sem_etc <kernel>:acquire_sem_etc <bfs>:Lock7JournalP11Transaction <bfs>:Start11TransactionP6Volumex <bfs>:bfs_put_vnodeFPvT0b <kernel>:free_vnodeFP5vnodeb <kernel>:dec_vnode_ref_countFP5vnodeb <kernel>:fix_direntFP5vnodeP6direntP10io_context <kernel>:dir_readFP10io_contextP5vnodePvP6direntUlPUl <kernel>:dir_readFP10io_contextP15file_descriptorP6direntUlPUl <kernel>:_user_read_dir <kernel>:pre_syscall_debug_done iframe at 0x94960fa8 (end = 0x94961000) eax 0x73 ebx 0x42c818 ecx 0x70206be0 edx 0xffff0104 esi 0x70206c40 edi 0x18117f78 ebp 0x70206c0c esp 0x94960fdc eip 0xffff0104 eflags 0x246 vector: 0x63 error code: 0x0 <libbe.so>:GetNextRef10BDirectoryP9entry_ref + 0x008e <snip>

thread 170: <kernel>:context_switchFP6threadT0 <kernel>:scheduler_reschedule <kernel>:switch_sem_etc <kernel>:acquire_sem_etc <bfs>:Lock7JournalP11Transaction <bfs>:Start11TransactionP6Volumex <bfs>:bfs_put_vnodeFPvT0b <kernel>:free_vnodeFP5vnodeb <kernel>:dec_vnode_ref_countFP5vnodeb <kernel>:dir_free_fdFP15file_descriptor <kernel>:put_fd <kernel>:common_close <kernel>:_user_close <kernel>:pre_syscall_debug_done iframe at 0x947a9fa8 (end = 0x947aa000) eax 0x77 ebx 0x45f818 ecx 0x700c12e0 edx 0xffff0104 esi 0x700c15bc edi 0x700c15bc ebp 0x700c130c esp 0x947a9fdc eip 0xffff0104 eflags 0x207 vector: 0x63, error code: 0x0 <libbe.so>:_._6BEntry <libbe.so>:_TranslateRefC7BRosterB9entry_refP9BMimeTypeT1P5BFilePb <snip>

Let me know if you need me to pull any other info out.

comment:27 by axeld, 16 years ago

I would have preferred opening another ticket for this bug as it has nothing to do with the original bug. Anyway, the stack crawl of thread 8 (the block writer/notifier) looks indeed very helpful; I forgot that it may call cache_sync_transaction() itself, too.

comment:28 by axeld, 16 years ago

Resolution: fixed
Status: reopenedclosed

Should be fixed in hrev24806. If not, let's open a new ticket this time.

comment:29 by anevilyak, 16 years ago

Confirmed fixed.

Note: See TracTickets for help on using tickets.