Opened 14 years ago

Last modified 7 years ago

#5777 assigned bug

Kernel starts paging when writing to slow media

Reported by: axeld Owned by: nobody
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

When writing large files (or many larger files), Haiku starts paging out user applications if the target media cannot keep up (ie. is a USB hard drive).

This makes Haiku pretty much unusable while doing so.

Change History (15)

comment:1 by bonefish, 14 years ago

Owner: changed from bonefish to axeld
Status: newassigned

I thought we already agreed that this must be a file cache issue (i.e. your domain). If memory gets low, the file cache should avoid generating more page pressure. IIRC there is even code that tries to do that, but obviously it has issues.

comment:2 by axeld, 14 years ago

I just probably won't have time to look into this for alpha/2, and it would be a shame to leave it like this.

comment:3 by axeld, 14 years ago

BTW the problem is the write case, not the read case, and I'm not sure how that one should prevent the cache pressure - it would need to wait until pages have been written to disk to actually do so, and I don't think such a mechanism exists already.

Currently, the file cache only triggers the VM to write back the modified pages immediately, but if the device is too slow, this can't be enough.

comment:4 by axeld, 14 years ago

There is of course vm_page_write_modified_page_range() one can use instead of vm_page_schedule_write_page_range() which pretty much does what we want to.

I just tested it, and while it seems to improve the situation slightly, it doesn't solve the issue. Should I check it in anyway?

comment:5 by axeld, 14 years ago

Checked it in hrev36519.

comment:6 by bonefish, 14 years ago

BTW, have you verified that the system is paging (i.e. checked the used swap)? I'm asking because with the method described in http://dev.haiku-os.org/ticket/5816#comment:3 the system starts thrashing the disk at some point and all applications accessing the disk (e.g. Deskbar or Tracker) become very slow. There's no paging going on, though.

Generally, attaching more info to the ticket wouldn't harm.

in reply to:  6 comment:7 by axeld, 14 years ago

Replying to bonefish:

BTW, have you verified that the system is paging (i.e. checked the used swap)? I'm asking because with the method described in http://dev.haiku-os.org/ticket/5816#comment:3 the system starts thrashing the disk at some point and all applications accessing the disk (e.g. Deskbar or Tracker) become very slow. There's no paging going on, though.

I have not verified it - at that moment, I was actually backing up data, and didn't want to lose anything. Since I copied very large files (1 GB each), the block cache didn't have that much work, so I would assume it wasn't at fault.

Furthermore, the boot volume was neither on the source nor on the target device. And finally, a jumping mouse cursor without CPU load could be a sign.

Generally, attaching more info to the ticket wouldn't harm.

Sure, but since it's that easy to reproduce, I was lazy :-)

I'll check if your work on #5816 fixes the issue.

comment:8 by scottmc, 13 years ago

Milestone: R1/alpha3R1/beta1

Is this still an issue?

comment:9 by pulkomandy, 10 years ago

Well, I still get problems with copying a large amount of files to an USB drive. The current situation is apparently the file cache will eat all memory, and the low resource manager struggles to free some of it, using one CPU core at 100% for this.

Trying to launch applications will fail with an "out of memory" error. Activity Monitor shows small values for "memory use" (485MB), "cache memory" (263MB), and "block cache" (12MB). I guess the file cache is using the remaining 7.5GB?

There are noticeable lockups of writing to disk (I'm doing this using rsync -avhPX). Apparently it will stay locked, waiting for the low resource manager to free some cache space...

comment:10 by bonefish, 10 years ago

@pulkomandy: The symptoms you are observing probably aren't even related to the original bug, but since no one has really attached any information so far it is hard to tell. The ticket description refers to large files while you are copying many files. I believe your memory usage statistics actually mean that not a lot of memory is used, save for a bit of file cache, which should be subsumed under "cache memory". The low resource in your case is probably kernel address space (the syslog would say). The rather small "block cache" value supports that assumption (the low resource manager is pushing the block cache to free up address space).

FWIW, I saw a similar/the same issue at the last BeGeistert on Olivier's machine. Since I already worked on another bug I didn't dig too deep, but I saw that a lot of address space was used for the physical page mapper. My theory is that page mapper slots are leaked. The code Michael introduced to improve I/O scheduling support for USB (hrev33523, hrev33524, hrev33525, hrev33526) has some issues (cf. my TODOs here and here) which might actually be responsible. But this needs closer examination.

Anyway, this should go in a new ticket and it should be checked whether the original issue of this ticket still exists.

comment:11 by pulkomandy, 10 years ago

I'll try to have a closer look when I'm done with actually backing up my files. I'm not sure there is an actual leak, as this has been running for about 24 hours. This means the low resource manager actually frees memory, but new data fills it up almost immediately.

Now even trying to run cat to look at the syslog leads to an "out of memory" error, so I can't look at what the problem is this way. I'll let this run until my files are safe, and try to reproduce later.

comment:12 by pulkomandy, 10 years ago

Well, my issues being tracked in #10637.

comment:13 by richienyhus, 9 years ago

So is this still with us or is it just #10637 that is left ?

comment:14 by pulkomandy, 9 years ago

Milestone: R1/beta1R1

comment:15 by axeld, 7 years ago

Owner: changed from axeld to nobody
Note: See TracTickets for help on using tickets.