Opened 3 months ago
Last modified 3 months ago
#19073 new enhancement
[packagefs] Investigate alternative data structures for the by-name lookup of Nodes in Directory
Reported by: | waddlesplash | Owned by: | bonefish |
---|---|---|---|
Priority: | low | Milestone: | Unscheduled |
Component: | File Systems/packagefs | Version: | R1/beta5 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
At present, Nodes (including all UnpackingNodes) are stored in Directories in both a hash table (for fast by-name lookup) and a linked list (for sequential iteration.) However, a hash table might not be the most optimal way to perform lookup, and other data structures could combine the functions of the hash table and the linked list.
In https://review.haiku-os.org/c/haiku/+/8242 I experimented with using an AVL tree, but this wound up being slower, so clearly that's not the answer here.
Change History (4)
comment:1 by , 3 months ago
comment:2 by , 3 months ago
My replies:
where are the other 9 bytes coming from?
The hash table has to allocate an array of pointers for the table itself, of course. I think it allocates rounding up to the nearest power of 2, so it's a little larger than one element per entry in the general case.
With a perfect hash function, there are no collisions, and so this pointer could be removed.
Unless I'm mistaken, the hash table doesn't resize immediately when new items are inserted, but only after some threshold is crossed (or at least that's what would seem to make the most sense for an open hash table.) Before the threshold is crossed, we will have some items with different hashes but which are stored in the same bucket. So we still need the pointer regardless.
Another possible data structure to use here is an IterableSplayTree. That might wind up being faster in the end due to the splaying status. However it has the major disadvantage that we would need a write lock to do lookups into it, so that may be worse in the end.
comment:3 by , 3 months ago
For future reference, here's the timing instrumentation code:
@@ -105,11 +102,21 @@ Directory::RemoveChild(Node* node) Node* Directory::FindChild(const StringKey& name) { - return fChildTable.Lookup(name); + static bigtime_t total = 0; + static int32 lookups = 0; + + bigtime_t time = system_time(); + Node* node = fChildren.Find(name); + time = system_time() - time; + atomic_add64(&total, time); + if ((atomic_add(&lookups, 1) % 1000) == 0) + dprintf("FindChild lookups time: %" B_PRIdBIGTIME "\n", atomic_get64(&total)); + + return node; } void Directory::AddDirectoryIterator(DirectoryIterator* iterator)
comment:4 by , 3 months ago
There is an implementation of "compact string tries" that looks interesting here: https://github.com/kampersanda/poplar-trie
It may be useful for the string pool also.
Copying PulkoMandy's comment from there in full: