Opened 4 months ago

Last modified 2 weeks ago

#14755 assigned bug

Haiku Grinds to a Halt when Dribbling

Reported by: AGMS Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: Servers/media_server Version: R1/Development
Keywords: Cc: agmsmith@…, ttcoder
Blocked By: Blocking: #4954
Has a Patch: no Platform: All

Description

Haiku fails after a few days in radio station use. I was able to recreate a similar problem by dribble reading files (filling the cache) while playing sounds and appending to other files. The cache ate up all memory and then all sorts of things started failing. Would be nice if the cache wasn't so aggressive in using all memory.

To recreate it, I was using a 640MB VirtualBox running Haiku R1B1+100, running with virtual memory turned off to save time and keep it simpler. I was playing a sound every 10 seconds (using the "flite" voice synthesis to read a fortune cookie), appending to several files every 10 seconds (see attached program), and reading a file every second from a 50GB collection (using find | xargs md5sum ; sleep 1s).

It seems that the sound playback leaks memory areas and the cache uses up other memory. The combination leads to fragmented memory, which the cache doesn't recognise as being a shortage condition, so it keeps on allocating. Then all sorts of things fail when they are unable to get a block large enough, from forks to disk writes to GUI. Eventually the OS gets stuck (windows frozen).

Ideally, the cache would be using memory areas in an address independent way and you could unmap areas and then remap them in a contiguous address range. Same for the sound buffers. If that's too difficult, the cache low memory detection could also look for fragmentation. Or the media system could stop leaking areas.

The quick fix is to restart the media system, so that it frees its areas. But memory will still be kind of fragmented. Would be nice to similarly restart the cache, or get it to drop all its memory areas somehow.

Attachments (4)

AreaListing.txt (239.1 KB) - added by AGMS 4 months ago.
Memory areas in use listing.
ForkedErrors.txt (346 bytes) - added by AGMS 4 months ago.
Forking and other errors when memory is fragmented.
syslog.txt (294.6 KB) - added by AGMS 4 months ago.
Syslog at the time.
TestFileAppend.cpp (8.2 KB) - added by AGMS 4 months ago.
TestFileAppend program for appending to a file slowly, then reading it back to see if it got corrupted.

Download all attachments as: .zip

Change History (15)

Changed 4 months ago by AGMS

Attachment: AreaListing.txt added

Memory areas in use listing.

Changed 4 months ago by AGMS

Attachment: ForkedErrors.txt added

Forking and other errors when memory is fragmented.

Changed 4 months ago by AGMS

Attachment: syslog.txt added

Syslog at the time.

Changed 4 months ago by AGMS

Attachment: TestFileAppend.cpp added

TestFileAppend program for appending to a file slowly, then reading it back to see if it got corrupted.

comment:1 Changed 4 months ago by waddlesplash

Ideally, the cache would be using memory areas in an address independent way and you could unmap areas and then remap them in a contiguous address range.

This is only a problem on 32-bit systems where the address space is easily exhausted. On 64-bit systems there is so much spare address space that this is never an issue. So, spending however much time to implement an "address space defragmenter" is probably not very high on any priority list; just use a 64-bit build if you need to.

If that's too difficult, the cache low memory detection could also look for fragmentation.

Failure to insert things into the kernel address space should issue a low-resource notification; but it didn't because the code was if 0'd out. hrev52637 (& hrev52639 which amends it) should improve things here significantly.

Or the media system could stop leaking areas.

Obviously this is a problem. If killing the media_server fixes it, then that narrows down the possibilities. Though I'm a bit confused here: is it the kernel address space or the userland address space that gets fragmented? No matter how fragmented physical memory gets, it should be possible to allocate (non-contiguous) virtual memory as long as there is some still free.

The kernel has its own 2GB address space, and each application has its own 2GB; and so even if the user application's address space gets especially fragmented, the kernel should be fine. (Or am I missing something here?)

Or, are you just plain out of memory altogether?

comment:2 in reply to:  1 Changed 4 months ago by cb88

Or, are you just plain out of memory altogether?

This from what he said if you note that he turned off the swap file to make it occur quicker.

Sounds similar to why booting on low memory systems doesn't work... where there it's just read a bunch of hpkg files from USB/CD etc, they end up cached, until you run out of ram and it grinds to a halt. My Tyan PII box can't boot haiku anymore with 512MB last I checked.

comment:3 Changed 4 months ago by ttcoder

Cc: ttcoder added

comment:4 Changed 4 months ago by AGMS

Good point, a lot of those buffers are in the media server process. So it's actually running out of memory system-wide, rather than kernel address space.

I'll see how it fares with more memory in some new tests.

Like you say, the longer term workaround is to use 64 bits, and hope that the page tables can grow.

Still, it would be nice to have an OS that can play sound and read files over a longer period of time, like BeOS used to do.

comment:5 Changed 4 months ago by waddlesplash

If it's running out of memory then changing to 64-bit won't help.

Obviously we should be able to play audio for any length of time and not run out of memory. So this is a media server bug then.

comment:6 in reply to:  5 Changed 4 months ago by cb88

Replying to waddlesplash:

So this is a media server bug then.

Not necessarily, could be a filesystem / cache bug etc...

comment:7 Changed 4 months ago by waddlesplash

Please reread the ticket history. Cache memory increasing is a problem but this should have been solved at least partially by my recent changes. The ticket itself notes that restarting media_server fixes the other issues.

comment:8 Changed 4 months ago by waddlesplash

Component: System/KernelServers/media_server
Keywords: memory fragmentation long duration removed
Owner: changed from nobody to Barrett
Status: newassigned

comment:9 Changed 4 months ago by AGMS

With more memory, it runs better (not running out of virtual address space either). Though after an overnight run, there are 32000+ areas used by the media system, sound doesn't work, and opening ProcessController takes about 4 seconds and sometimes doesn't draw the whole graph display (guess it's adding up those areas). Restarting the media server got sound working.

Overnight the logs just show kernel memory being slowly used up, and then some funny stuff when I started using the GUI at 11:00 (oddly exactly on the hour):

2018-12-12 22:27:08 KERN: slab memory manager: created area 0x91001000 (10389598)
2018-12-13 00:08:54 KERN: slab memory manager: created area 0x91801000 (13347565)
2018-12-13 02:01:12 KERN: slab memory manager: created area 0x92001000 (16589412)
2018-12-13 03:49:52 KERN: slab memory manager: created area 0x92801000 (19776951)
2018-12-13 05:42:01 KERN: slab memory manager: created area 0x93001000 (23021653)
2018-12-13 07:31:59 KERN: slab memory manager: created area 0x93801000 (26240792)
2018-12-13 11:00:00 KERN: add_memory_type_range(29929999, 0x90000, 0x70000, 0)
2018-12-13 11:00:00 KERN: remove_memory_type_range(29929999, 0x90000, 0x70000, 0)
2018-12-13 11:00:00 DAEMON 'app_server': Application for user 0 does not support the current server protocol (0).

comment:10 Changed 4 months ago by Barrett

Blocking: 4954 added

This is a longstanding bug. If you want to prove me wrong bisect ;)

The buffer management is completely flawed unfortunately.

comment:11 Changed 2 weeks ago by korli

Owner: changed from Barrett to nobody
Note: See TracTickets for help on using tickets.