#15995 closed bug (fixed)
create_area have O(n^2) on number of allocated areas
Reported by: | X512 | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta3 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This is hrev54155.
When creating a lot areas, creating new area become slower. Also task managers (ProcessController and Slayer) become extremely slow.
Part of profile:
hits unknown image ------------------------------------------------------------------------------ 31370 0 1 kernel_x86_64 hits in us in % image function ------------------------------------------------------------------------------ 15024 15024000 46.90 1 VMUserAddressSpace::_InsertAreaSlot(unsigned long, unsigned long, unsigned long, unsigned int, unsigned long, VMUserArea*, unsigned int) 13643 13643000 42.59 1 VMUserAddressSpace::LookupArea(unsigned long) const 682 682000 2.13 1 x86_page_fault_exception
Problem function: https://git.haiku-os.org/haiku/tree/src/system/kernel/vm/VMUserAddressSpace.cpp#n526.
Attachments (2)
Change History (7)
by , 5 years ago
Attachment: | MemTest.cpp added |
---|
comment:1 by , 5 years ago
Indeed, appears LookupArea() just iterates over the area list. I'm not sure why it would be O(N2) though, wouldn't that just be O(N)?
comment:2 by , 5 years ago
O(n^2)
is time complexity of whole test (creating or deleting n
areas). create_area
and area_for
have O(n)
time complexity.
comment:3 by , 4 years ago
This would be fixed with these two parts:
https://review.haiku-os.org/c/haiku/+/2839
https://review.haiku-os.org/c/haiku/+/2842
I do note that this is a rather extreme test case, albeit a nice one to work with.
This generic allocator via areas could also be optimized without the above changes. For example the insertion point hinting, implemented internally in the second change above, could be done on the application side by using B_BASE_ADDRESS and supplying the address just after the previously created area. Then, the area id could simply be stored inside or alongside the item, so the deallocation path could avoid the need for a separate area lookup via area_for(). The latter is true also with the above changes, as it removes the overhead of a syscall.
Also note, that due to ASLR the allocated areas will, purposely, not be packed densely but have randomly sized gaps. On 64 bit this is unproblematic as the address space is huge, but with 32 bit this becomes quite limiting. Then there is the heap allocator that reserves a large chunk of address space, further reducing the space where areas can be inserted into efficiently. Under 32 bit you will therefore only be able to reach around 1 instead of the expected 2 GiBs of allocations. Purely for test purposes you can get around both issues by disabling ASLR and switching the system allocator:
DISABLE_ASLR=1 LD_PRELOAD=libroot_debug.so MALLOC_DEBUG=g MemTest
With that setup you get packed areas and no hindering address space reservations and can reach the expected maximum allocation the 32 bit address space allows for.
In my tests, allocating via areas is only slightly slower than using malloc. This is somewhat surprising considering the syscall overhead and the creation and insertion of the involved structures. It either speaks for the performance of the area allocation system, or against the performance of our malloc...
Test program