Context Navigation

#3768 closed bug (fixed)

create_image -i 943718400 (or other large sizes) results in freeze

Reported by:	anevilyak	Owned by:	bonefish
Priority:	normal	Milestone:	R1
Component:	System/Kernel	Version:	R1/Development
Keywords:		Cc:	black.belt.jimmy@…
Blocked By:		Blocking:
Platform:	All

Description

The above command run from within Haiku (or correspondingly asking the build sys to create a similarly sized image) results in the desktop freezing. Right before this happens, one can observe CPU usage shooting through the roof in ProcessController. Entering KDL consistently shows the page_daemon with backtraces like the following:

Welcome to Kernel Debugging Land...
Thread 7 "page daemon" running on CPU 0
kdebug> bt
stack trace for thread 7 "page daemon"
    kernel stack: 0x80188000 to 0x8018c000
frame               caller     <image>:function + offset
 0 8018bab4 (+  48) 80063745   <kernel_x86>:invoke_debugger_command + 0x00f5
 1 8018bae4 (+  64) 80063535   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x801319c0, int32: 0, 0x0 "<NULL>") + 0x0079
 2 8018bb24 (+  64) 800638bc   <kernel_x86>:invoke_debugger_command_pipe + 0x009c
 3 8018bb64 (+  48) 80064e6c   <kernel_x86> ExpressionParser<0x8018bc18>::_ParseCommandPipe(0x8018bc14) + 0x0234
 4 8018bb94 (+  64) 800642a6   <kernel_x86> ExpressionParser<0x8018bc18>::EvaluateCommand(0x80122d60 "bt", 0x8018bc14) + 0x02ba
 5 8018bbd4 (+ 224) 80066294   <kernel_x86>:evaluate_debug_command + 0x0088
 6 8018bcb4 (+  64) 80061636   <kernel_x86> kernel_debugger_loop() + 0x01ae
 7 8018bcf4 (+  32) 800624c5   <kernel_x86>:kernel_debugger + 0x004d
 8 8018bd14 (+ 192) 8006246d   <kernel_x86>:panic + 0x0029
 9 8018bdd4 (+  48) 916706cd   </boot/system/add-ons/kernel/bus_managers/ps2>:ps2_interrupt + 0x00d1
10 8018be04 (+  48) 8003f3cf   <kernel_x86>:int_io_interrupt_handler + 0x006f
11 8018be34 (+  48) 800d74b4   <kernel_x86>:hardware_interrupt + 0x0070
12 8018be64 (+  12) 800dab56   <kernel_x86>:int_bottom + 0x0036
kernel iframe at 0x8018be70 (end = 0x8018bec0)
 eax 0x0            ebx 0x8018bf84      ecx 0x1          edx 0x246
 esi 0x8256a6b0     edi 0x8018bf84      ebp 0x8018bee8   esp 0x8018bea4
 eip 0x800ca27c  eflags 0x246
 vector: 0x21, error code: 0x0
13 8018be70 (+ 120) 800ca27c   <kernel_x86>:vm_cache_acquire_locked_page_cache + 0x0048
14 8018bee8 (+  48) 800cc619   <kernel_x86> PageCacheLocker<0x8018bf84>::Lock(vm_page*: 0x8256a6b0, true) + 0x0031
15 8018bf18 (+  48) 800cc571   <kernel_x86>:__15PageCacheLockerP7vm_pageb + 0x0025
16 8018bf48 (+  64) 800cc6bd   <kernel_x86> clear_page_activation(int32: 101726) + 0x0029
17 8018bf88 (+  80) 800cc9a8   <kernel_x86> page_daemon(NULL) + 0x00c8
18 8018bfd8 (+  32) 800574e3   <kernel_x86> _create_kernel_thread_kentry() + 0x001b
19 8018bff8 (+2145861640) 80057480   <kernel_x86> thread_kthread_exit() + 0x0000
kdebug>

This is 100% reproducible on my system with 1GB of RAM and hrev30165.

Attachments (1)

create-image.out (110.5 KB ) - added by bonefish 16 years ago.: scheduler tracing output

Download all attachments as: .zip

Change History (26)

comment:1 by anevilyak, 16 years ago

I should also add, if relevant: 1) 200MB of RAM is devoted to kernel trace buffer (vm cache tracing is also still active), and 2) 2GB swapfile is likewise present.

comment:2 by bbjimmy, 16 years ago

Cc:	black.belt.jimmy@… added

comment:3 by HaikuBot, 16 years ago

I have the same bug.

follow-up: 5 comment:4 by bonefish, 16 years ago

One problem is BFS not supporting sparse files. The other is our file cache which isn't particularly clever with respect to large continuous reads/writes. So in this case it iteratively claims that many pages, which causes the page daemon to become very active trying to find pages that aren't desperately needed anymore.

Anyway, the system is supposed to return to a usable state after a short while. Does it?

in reply to: 4 comment:5 by anevilyak, 16 years ago

Replying to bonefish:

Anyway, the system is supposed to return to a usable state after a short while. Does it?

How long is a short while in this case? I waited about 3-4 minutes before giving up and issuing a reboot. Note that smaller sizes like say 250-300MB return instantly and operate fine, so I'm assuming this is in some way triggered by memory pressure?

comment:6 by bonefish, 16 years ago

The problem is indeed easily reproducible via create_image and an image size sufficiently beyond the available physical memory.

High CPU usage is expected at first, since the create_image thread is only memory bound at first (happily allocating and clearing pages for the file cache). Later low resource manager and page daemon will join in.

Disk I/O starts after a short time, but only periodically and in short bursts. After a while the GUI freezes (probably blocking on page allocations), while disk I/O continues at the previous rate. A while later disk I/O ceases, too, running into a deadlock.

KDL session:

kdebug> page_stats
page stats:
total: 116736
active: 33481
inactive: 0
busy: 288
unused: 311
wired: 6716
modified: 75891
free: 43
clear: 6
reserved pages: 50
page deficit: 7
mapped pages: 40215

free queue: 0x80136cb0, count = 43
clear queue: 0x80136cbc, count = 6
modified queue: 0x80136cc8, count = 75891 (6803 temporary, 6803 swappable, inactive: 6654)
active queue: 0x80136ce0, count = 40540
inactive queue: 0x80136cd4, count = 0

As expected, no free pages available. And lots of modified pages.

kdebug> thread -s 8
thread         id  state     wait for   object  cpu pri  stack      team  name
0x817eb000      8  waiting   sem            24    -  10  0x80189000    1  page daemon
kdebug> sem 24
SEM: 0x88cf04e0
id:      24 (0x18)
name:    'page daemon'
owner:   1
count:   -1
queue:   8
last acquired by: 8, count: 1
last released by: 143, count: 1

The page daemon is at rest ATM, but it supposedly happily scans pages periodically.

kdebug> threads 1
thread         id  state     wait for   object  cpu pri  stack      team  name
0x817f1800     31  waiting   cvar   0x80fb79f0    -  10  0x8053c000    1  scsi notifier
0x8a1b2000     63  waiting   sem           515    -  10  0x8023c000    1  net timer
0x89ce9000     32  waiting   cvar   0x80fb7a94    -  10  0x807f2000    1  scsi scheduler
0x8012cb00      1  running           -            0   0  0x80201000    1  idle thread 1
0x89ce9800     33  waiting   cvar   0x80fb7abc    -  10  0x807f6000    1  scsi notifier
0x8012d0e0      2  running           -            1   0  0x80780000    1  idle thread 2
0x8a1b6000     65  waiting   sem           567    -  15  0x8024c000    1  loop consumer
0x89cea000     34  waiting   cvar   0x80fb7b60    -  10  0x80d7d000    1  scsi scheduler
0x817d8800      3  waiting   cvar   0x8012caa8    -  15  0x80145000    1  undertaker
0x8a1b7800     66  waiting   sem           571    - 120  0x80575000    1  fast taskq
0x89cea800     35  waiting   cvar   0x80fb7b88    -  10  0x80d81000    1  scsi notifier
0x817d9000      4  zzz                            -   5  0x80179000    1  kernel daemon
0x817d9800      5  zzz                            -   5  0x8017d000    1  resource resizer
0x8a1b9000     68  waiting   sem           586    -  99  0x805ae000    1  em_taskq
0x89ceb000     37  waiting   sem           215    -   5  0x805a1000    1  syslog sender
0x817ea000      6  zzz                            -   1  0x80181000    1  page scrubber
0x89cf0000     38  waiting   sem           219    -  10  0x805aa000    1  invalidate_loop
0x817ea800      7  waiting   rwlock 0x811706a8    -  11  0x80185000    1  page writer
0x8a1bd800     70  waiting   sem           600    -  15  0x805f8000    1  /dev/net/ipro1000/0 consumer
0x8a191800     39  waiting   sem           223    -  10  0x805b7000    1  run_on_exit_loop
0x817eb000      8  waiting   sem            24    -  10  0x80189000    1  page daemon
0x8a20c000    102  waiting   sem           946    -  20  0x82625000    1  ps2 service
0x817eb800      9  waiting   cvar   0x80136c04    - 110  0x8018d000    1  object cache resizer
0x817ec000     10  waiting   sem             0    - 110  0x80191000    1  heap grower
0x8a1cc800     73  waiting   sem           599    -   5  0x80724000    1  ethernet link state checker
0x817ec800     11  waiting   mutex  0x8115116c    -   5  0x80195000    1  low resource manager
0x817ed000     12  waiting   sem            38    -   5  0x8019a000    1  block notifier/writer
0x8a1cd800     75  waiting   sem           590    -  90  0x80743000    1  /dev/net/ipro1000/0 reader
0x817ee000     18  waiting   sem            64    -  20  0x80288000    1  uhci finish thread
0x817ee800     19  waiting   sem            68    -  20  0x8028c000    1  uhci isochronous finish thread
0x817f0000     20  zzz                            -   5  0x80290000    1  usb explore
0x81794000     21  zzz                            -  10  0x80294000    1  media checker
0x81794800     27  waiting   sem           100    -  10  0x80298000    1  locked_pool_enlarger
0x81793800     28  waiting   sem           108    -  20  0x8029e000    1  scsi_bus_service
0x817f0800     29  waiting   sem           144    -  20  0x802ab000    1  scsi_bus_service
0x817f1000     30  waiting   cvar   0x80fb79c8    -  10  0x80538000    1  scsi scheduler
kdebug> rwlock 0x811706a8
rw lock 0x811706a8:
  name:            bfs inode+8.2
  holder:          215
  reader count:    0
  writer count:    1
  owner count:      1
  flags:           0x1
  waiting threads: 7/r
kdebug> thread -s 215
thread         id  state     wait for   object  cpu pri  stack      team  name
0x8a206000    215  waiting   cvar   0x80136c9c    -  10  0x80208000  215  create_image
kdebug> cvar 0x80136c9c
condition variable 0x80136c9c
  object:  0x80136cb0 (free page)
  threads: 156 112 215 52 167 85 143
kdebug> sc 215
stack trace for thread 215 "create_image"
    kernel stack: 0x80208000 to 0x8020c000
      user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 8020b6b4 (+  48) 8005eae8   <kernel_x86> context_switch(thread*: 0x8a206000, thread*: 0x8a1a2800) + 0x003c
 1 8020b6e4 (+  64) 8005edf7   <kernel_x86> simple_reschedule() + 0x02c7
 2 8020b724 (+  64) 80038679   <kernel_x86> ConditionVariableEntry<0x8020b79c>::Wait(uint32: 0x0 (0), int64: 0) + 0x01bd
 3 8020b764 (+ 128) 800cf5f6   <kernel_x86> steal_pages(vm_page*: NULL, uint32: 0x21 (33), true) + 0x02f2
 4 8020b7e4 (+  48) 800d0350   <kernel_x86>:vm_page_reserve_pages + 0x00f0
 5 8020b814 (+  64) 80034b34   <kernel_x86> reserve_pages(file_cache_ref*: 0x8115de10, uint32: 0x20 (32), true) + 0x0124
 6 8020b854 (+1120) 80035486   <kernel_x86> write_to_cache(file_cache_ref*: 0x8115de10, NULL, int64: 284033024, int32: 0, uint32: 0x10eff000, uint32: 0x0 (0), false, uint32: 0x20 (32), uint32: 0x20 (32)) + 0x0456
 7 8020bcb4 (+ 208) 80035cc7   <kernel_x86> cache_io(0x8115de10, NULL, int64: 284164096, uint32: 0x10f00000, 0x8020be20, true) + 0x059b
 8 8020bd84 (+  96) 800364fd   <kernel_x86>:file_cache_write + 0x00e1
 9 8020bde4 (+  64) 805d41ce   <bfs> Inode<0x811706a4>::FillGapWithZeros(int64: 0, int64: 419430400) + 0x0066
10 8020be24 (+ 112) 805df751   <bfs> bfs_write_stat(fs_volume*: 0x81109b40, fs_vnode*: 0x8116bcc0, stat*: 0x8020befc, uint32: 0x8 (8)) + 0x01ed
11 8020be94 (+  32) 800a5dc0   <kernel_x86> common_write_stat(file_descriptor*: 0x8115dde8, stat*: 0x8020befc, int32: 8) + 0x0034
12 8020beb4 (+ 144) 800ab552   <kernel_x86>:_user_write_stat + 0x0182
13 8020bf44 (+ 100) 800db431   <kernel_x86>:handle_syscall + 0x00be
user iframe at 0x8020bfa8 (end = 0x8020c000)
 eax 0x89           ebx 0x2f3294        ecx 0x0          edx 0x7feeee94
 esi 0x0            edi 0x0             ebp 0x7feeeedc   esp 0x8020bfdc
 eip 0xffff0102  eflags 0x203      user esp 0x7feeee60
 vector: 0x63, error code: 0x0
14 8020bfa8 (+   0) ffff0102   <commpage>:commpage_syscall + 0x0002
7feeeedc -- read fault

Deadlock: bfs_write_stat() has the inode write lock while writing into the file cache and it blocks trying to reserve pages. The page writer blocks trying to read-lock the same inode. Hence no modified pages are written, so that no pages can be freed.

A solution could be to split the resize operation, i.e. resize the file with write lock and clear the new content with read lock only. Even if that might be a security problem, once we're actually interested in avoiding those.

At any rate that would solve only the deadlock. The problem that the system becomes horribly unresponsive (to the point of GUI freeze) -- at least for a time -- will persist.

follow-up: 8 comment:7 by anevilyak, 16 years ago

Thanks for investigating :) Should this one go to Axel in that case?

in reply to: 7 comment:8 by bonefish, 16 years ago

Replying to anevilyak:

Thanks for investigating :) Should this one go to Axel in that case?

I've already convinced him to fix the deadlock at least. :-) Afterwards I can have a look whether scheduling analysis or profiling turn up why the system behaves that badly under memory pressure.

comment:9 by axeld, 16 years ago

BFS deadlock has been fixed in hrev30221.

comment:10 by bonefish, 16 years ago

Investigating...

by bonefish, 16 years ago

Attachment:	create-image.out added

scheduler tracing output

follow-up: 12 comment:11 by bonefish, 16 years ago

Attached scheduler tracing analysis output for a 1 CPU system (VMware) with 512 MB, 100 MB tracing buffer, no swap file, and an image size of 400 MB. I haven't really had a closer look yet, but a few interesting points are apparent:

The page daemon runs most of the time. Which is kind of expected, I suppose, though not very helpful.
Pretty much every thread does I/O, which I find weird.
Several threads have horrible latencies (i.e. wait times while ready), which is particularly worrisome for high-priority threads (like _input_server_event_loop_).

Will have a closer look tonight. Ideas welcome.

in reply to: 11 ; follow-up: 13 comment:12 by anevilyak, 16 years ago

Replying to bonefish:

Will have a closer look tonight. Ideas welcome.

I'm guessing this is most likely a flaw in the preemption logic itself, since I can vouch for the fact that the exact same flaw shows up in scheduler_affine, and the decision making process for when to preempt is one of the few things that was left entirely untouched when creating that scheduler.

in reply to: 12 ; follow-up: 15 comment:13 by bonefish, 16 years ago

Replying to anevilyak:

I'm guessing this is most likely a flaw in the preemption logic itself, since I can vouch for the fact that the exact same flaw shows up in scheduler_affine, and the decision making process for when to preempt is one of the few things that was left entirely untouched when creating that scheduler.

Mmh, that doesn't quite convince me. We have a fixed quantum of 3 ms after which we reschedule and we reschedule in release_sem_etc(), if a thread with a higher priority has been woken up (and B_DO_NOT_RESCHEDULE was not set), as well as when sending signals (needs some fixing, but is relatively rare anyway).

For the realtime thread with the highest priority in a busy system one would thus expect an average latency of half a quantum, i.e. 1.5 ms. Unless I've overlooked one, that would be the "Extended PS/2 Mouse 1 watcher" with a priority of 104. According to the data its average latency was almost 6.7 ms, though. That's more than two quanta, which shouldn't really happen, since at the end of the quantum of the thread that woke up the "Extended PS/2 Mouse 1 watcher" it would have been scheduled for sure. The (priority 103) "_input_server_event_loop_" thread's average latency is 2.9 ms, which is least less than a quantum, but also greater than I would expect.

Regarding the mystery that almost every thread seems to do I/O, that can actually be easily explained: Apparently the memory pressure is high enough that read-only memory mapped executable pages are discarded, even ones that are not really "inactive". Hence we get a lot of page faults that need to re-read the pages into memory. This would also explain why the system is so unusable.

comment:14 by stippi, 16 years ago

Interesting. If the read-only executable pages had been discarded for any high-priority threads, wouldn't that affect their latency also? Maybe it happens seldomly, but still often enough to decrease the latencies? On the other hand, audio skipping can be observed even if the system is under much less stress, so maybe the problem is still somewhere else.

in reply to: 13 ; follow-up: 16 comment:15 by anevilyak, 16 years ago

Replying to bonefish:

Mmh, that doesn't quite convince me. We have a fixed quantum of 3 ms after which we reschedule and we reschedule in release_sem_etc(), if a thread with a higher priority has been woken up (and B_DO_NOT_RESCHEDULE was not set), as well as when sending signals (needs some fixing, but is relatively rare anyway).

Is it possible we're setting B_DO_NOT_RESCHEDULE in some cases where we shouldn't be?

Regarding the mystery that almost every thread seems to do I/O, that can actually be easily explained: Apparently the memory pressure is high enough that read-only memory mapped executable pages are discarded, even ones that are not really "inactive". Hence we get a lot of page faults that need to re-read the pages into memory. This would also explain why the system is so unusable.

I can confirm this behavior here as well. With your and Axel's changes the system does indeed now recover after 3-5 seconds (nice work!) of being unresponsive, though it's quite obvious all apps have been swapped out by the behavior afterwards (the first click to Deskbar takes several seconds to respond while it's paged back in for instance). What puzzles me is why memory pressure is being exerted in this manner at all. I would have assumed the VM would discard file cache pages/buffers before going after executables, so I would expect the inode being created/filled in this instance to have its pages flushed and reused once the pressure gets to that point. Is this not the case?

in reply to: 15 comment:16 by bonefish, 16 years ago

Replying to stippi:

Interesting. If the read-only executable pages had been discarded for any high-priority threads, wouldn't that affect their latency also?

No, latency is just the time from the thread waking up (after sleep, waiting on a semaphore,...) to it starting to run. The additional page faults re-reading the executable page just make it run slower and additionally wait for I/O.

Replying to anevilyak:

I can confirm this behavior here as well. With your and Axel's changes the system does indeed now recover after 3-5 seconds (nice work!) of being unresponsive,

It took quite a bit longer to recover in my test, but I suspect that also depends on the size of the image file, the total amount of memory etc.

though it's quite obvious all apps have been swapped out by the behavior afterwards (the first click to Deskbar takes several seconds to respond while it's paged back in for instance). What puzzles me is why memory pressure is being exerted in this manner at all. I would have assumed the VM would discard file cache pages/buffers before going after executables, so I would expect the inode being created/filled in this instance to have its pages flushed and reused once the pressure gets to that point. Is this not the case?

ATM the VM treats mapped and just cached files pretty much the same. Mapped pages get an initially higher usage count and therefore won't be evicted as quickly, but after two iterations of the page daemon without being touched they are back to zero, too and become available for recycling.

comment:17 by bga, 16 years ago

I may be missing something but, shouldn't disk caches be handled different? As I understand then they should not exert memory pressure unless on itself. What I mean is that if the file cache needs more memory it would only write back other cached pages and use the opened space. Obviously this would have to be controlled with so,e kind of limit to make the system usable (something like at least x% of the memory can be taken over by the cache (effectively exerting memory pressure) but anything above that would only make it try to recycle its own pages (unless, of course, there is plenty of free memory not mapped to programs). I think this is kinda like what Rene expected to be the case when he mentioned it above too.

Point is, disk cache should not exert so much memory pressure that applications would be swapped out (thus increasing IO even more in a system that is basically IO bound due to all the file operations going on).

How does, say, the Linux VM handle this? Does it also penalize interactive performance like our is doing?

comment:18 by axeld, 16 years ago

Other systems usually page out stuff as well, just their scoring mechanism seems to work a lot better than ours. But in general I agree with you that executables should get a much higher ranking, and should only be swapped out if they haven't been used for a long time.

I would also think that it might be beneficial to take pages from the cache causing the trouble first - at least if it is accessed mostly sequentially (which should already be determined).

comment:19 by bonefish, 16 years ago

Some ideas for improvements:

Start with a significantly higher usage count for mapped pages.
Let the page daemon penalize non-mapped active pages that haven't been accessed more than mapped active pages (e.g. -2 vs. -1).
Keep the inactive (maybe also the active) page queue sorted by usage_count (introduce per-level markers for easy queuing), so we can always pick the most inactive pages for recycling first.
Maybe maintain per usage_count level numbers of pages so we can better judge how much pressure the page daemon needs to apply.
Join page daemon and page writer and let the page daemon look only at interesting (active/inactive) pages (as in FreeBSD). Besides less work for the page writer this also avoids the competition between the two -- as the scheduling analysis data show, there's a lot of contention on commonly used locks (e.g. the global cache list mutex). Moreover the page daemon could starve the page writer due to its higher priority.

I'd like to play with these ideas, though I'd first want to make the DebugAnalyzer useful, so it becomes easier to see what change has what effects.

comment:20 by axeld, 16 years ago

IIRC the reasoning of separating the two was to let the expensive page daemon run as few times as possible. However, that was still with the old "iterate over all user address spaces" approach, that made the page daemon much more expensive it probably is today. It should still be much more expensive than the page writer alone, though.

OTOH letting the page daemon always run would improve the scoring of the pages to remove when the pressure comes. It should not really keep the CPUs busy all the time, though. And since contention is an actual problem as you say, it might be worth to investigate it. Maybe we should just have the page daemon also write back pages when it's running, taking over the duty of the page writer.

Anyway, the other ideas sound nice as well. I'm really looking forward to this (and the DebugAnalyzer, btw) :-)

follow-up: 22 comment:21 by anevilyak, 15 years ago

Version:	R1/pre-alpha1 → R1/Development

I'm curious, would some of the recent VM changes make the improvements listed in this ticket a bit easier to implement? The system still behaves quite horrendously under memory pressure, I just tried loading a 500MB data file into DebugAnalyzer and pretty much had the entire sys go completely unresponsive. Dropping the responsible threads into the userland debugger via KDL had no measurable effect even after giving the sys ~20 minutes to try and recover, which makes me wonder if it wasn't hitting another deadlock situation.

in reply to: 21 comment:22 by bonefish, 15 years ago

Replying to anevilyak:

I'm curious, would some of the recent VM changes make the improvements listed in this ticket a bit easier to implement?

Nope, not really. It's not that complicated anyway, just quite a bit of work.

The system still behaves quite horrendously under memory pressure, I just tried loading a 500MB data file into DebugAnalyzer and pretty much had the entire sys go completely unresponsive.

DebugAnalyzer is really a memory hog. It not only reads the complete file into memory, but also uses much more memory for various analysis data it computes. I haven't checked, but the total memory usage is probably 2 to 3 times the size of the file, maybe more. Even worse, DebugAnalyzer actually iterates more than once through the data, so if not everything fits fully into RAM, at least a part of it will always be paged out, and there'll probably be a lot of disk thrashing going on.

Dropping the responsible threads into the userland debugger via KDL had no measurable effect even after giving the sys ~20 minutes to try and recover, which makes me wonder if it wasn't hitting another deadlock situation.

As soon as memory of periodically running threads has been paged out, things really don't look good anymore. While a thread is waiting for one page being paged in again, the previous page will already have aged enough to be available for being paged out. So until the memory pressure is relieved there will be serious disk thrashing going on. It might be that the system actually still made progress, but it was slowed down to a crawl. Possibly after a few more days it might even have recovered. :-)

Anyway, as written above we need to significantly improve the page aging algorithm to do better in those situations.

Regarding the original issue of the ticket, I believe it fixed in hrev35299. I'll leave it open for the time being as it seem I had good ideas in 19 to improve the overall situation.

comment:23 by bonefish, 15 years ago

Resolution:	→ fixed
Status:	new → closed

Though I didn't really use most of the above ideas, hrev35393 implements a solution very similar to how things work in FreeBSD, which should solve the issues.

comment:24 by jessicah, 9 years ago

I'm actually running into this issue consistently on hrev49881 whilst running a buildbot for Haiku.

Some of the symptoms include:

Clicking on ProcessController to show the menu don't get acted upon until create_image finishes
Processes sometimes don't exit until create_image finishes (git is often a culprit here)
Closing Pe will end up with a window that doesn't redraw and won't close until create_image finishes

Output from ps -a:

Thread                       Id    State Prio    UTime    KTime
create_image              59454      run   10        0  1135167

comment:25 by jessicah, 9 years ago

Hmm, seems I had virtual memory disabled, which may have been causing this problem to show up again. Have re-enabled, will see if the issue goes away over the next couple of days.

Note: See TracTickets for help on using tickets.

Download in other formats: