Opened 16 years ago

Closed 5 years ago

Last modified 5 years ago

#3011 closed enhancement (fixed)

Filling Tracker windows with many files takes too long

Reported by: stippi Owned by: leavengood
Priority: normal Milestone: R1/beta2
Component: Applications/Tracker Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking: #3974, #10952
Platform: All

Description (last modified by aldeck)

If you don't have a folder with many files (like 30000+), you can reproduce it by running a query for all files (Alt-F + Enter). I have at least over 58000 files on both partitions and I am still watching the files dripple into the list as of now. The first couple thousands will go in quickly enough, although it could be argued it should be faster. But after 20000 files, it becomes apparent that something somewhere doesn't scale very well, and I am strongly guessing that it's the list insertion in Tracker. The sorting to find the right index is probably binary search based and fast enough, but then I suppose (without having looked at the code), that it updates the vertical position of all the following items. If that's indeed what's happening, one could also mark the position as invalid until encountering the first item with an invalid position (which would result in items with invalid positions starting at some index until the last item). The correct position could then be retrieved once it's needed (when drawing). Maybe there is even more room for optimization in the code. (Now my list contains 66500 entries and the harddrive LED is barely flashing with lots of CPU usage inbetween.)

Change History (40)

comment:1 by stippi, 16 years ago

Description: modified (diff)

comment:2 by bonefish, 16 years ago

I recall having looked into the poor scaling some years ago. Graphical updates (respectively updates of GUI related data) might be a problem as well, but one is definitely that the underlying model is a BList (or BObjectList) which is O(n) for random inserts, which is exactly what happens due to the sorting.

Rumors have it that there's a profiling tool for Haiku to examine such stuff. ;-)

in reply to:  2 comment:3 by aldeck, 16 years ago

Replying to bonefish:

[...] one is definitely that the underlying model is a BList (or BObjectList) which is O(n) for random inserts, which is exactly what happens due to the sorting.

Definitely, i've seen lots of O(n) (or worse) stuff in PoseView.cpp.

Rumors have it that there's a profiling tool for Haiku to examine such stuff. ;-)

I'd really like to use that :) Do we have some docs/example/notes on how to use it yet or should i dig the sources?

comment:4 by stippi, 16 years ago

Just try profile --help. :-)

in reply to:  4 ; comment:5 by aldeck, 16 years ago

Replying to stippi:

Just try profile --help. :-)

Hmm :) nice and simple. However i get lots of "no functions were hit" on the threads i'm interested in. Tried different options without luck...

in reply to:  5 ; comment:6 by bonefish, 16 years ago

Replying to aldeck:

Replying to stippi:

Just try profile --help. :-)

Hmm :) nice and simple. However i get lots of "no functions were hit" on the threads i'm interested in. Tried different options without luck...

That doesn't sound right. Will have a look...

in reply to:  6 comment:7 by bonefish, 16 years ago

Replying to bonefish:

Replying to aldeck:

Replying to stippi:

Just try profile --help. :-)

Hmm :) nice and simple. However i get lots of "no functions were hit" on the threads i'm interested in. Tried different options without luck...

That doesn't sound right. Will have a look...

Fixed in hrev28441.

If you have Linux/*BSD and can run KCachegrind, I can recommend the valgrind (callgrind) output. It allows to see both inclusive and exclusive function costs as well as call relationships. It's a bit annoying to find the right threads (hint: switch to "Time" (is in ms) and absolute costs).

The results are definitely interesting. :-)

comment:8 by aldeck, 16 years ago

Thanks for the quick fix! Now i'm not too sure if i want to see those results :D

comment:9 by aldeck, 16 years ago

Ok, found several perpetrators, and some info on a bug! (see #3054)

On pose creation three ops are to blame:

  • Model::CloseNode
  • PoseView::AddMimeTypes
  • FSClipboardFindNode (in BPose constructor)
  • and PoseView::FindPose to a lesser extent.

I'll investigate further in the next days, not accepting the ticket though, lacking time... Please continue the work on this if you'd like, i'm temporarily off.

comment:10 by anevilyak, 16 years ago

Is that profiling data for AddPosesTask? Or does it take into account the window thread as well? Most of the work for populating is actually done by ContainerWindow, the addposestask thread just grabs entry refs, does some minor ops on them and packages them up into a message that's sent to the window thread to process them and add to the visible list, etc.

comment:11 by aldeck, 16 years ago

Sorry forgot to precise. This is only the window thread, AddPosesTask works well (performance wise).

The problematic calls are coming from the CreatePoses code path in the window thread. This was the profiling of opening a directory for browsing with ~20000 items in it.

The most expensive call is FSClipoardFindNodeMode done on each pose creation. We briefly discussed about this with Rene on irc, it is only used to determine if a pose should be drawn transparent (ex: cut and paste). The costy op here is the locking of the clipboard. So a solution has to be found to get this info without locking the clipboard for each pose. Ideas welcome :)

As for the other calls (CloseNode, AddMimeTypes), they started to show their slowness when profiling with the FSClipboardFindNodeMode disabled, but that might be due to another side effect (#3054 will tell us maybe).

comment:12 by aldeck, 16 years ago

Owner: changed from axeld to aldeck
Status: newassigned

Ok, i made some good progress on this one, i've reimplemented AddMimeTypes wich had exponential complexity, still FSClipoardFindNodeMode to do (and FindPose maybe). While investigating, i found that the potential speedup will be almost x10 for a 20000 files folder, going from 120s too 17s. Still some work to do, accepting the ticket.

comment:13 by aldeck, 16 years ago

For reference, the AddMimeType issue has been fixed in hrev28891 .

Now i'd like to address the 'FindPose' issue, see this check before each Pose creation:

line 1610 in PoseView.cpp :

if (FindPose(model) || FindZombie(model->NodeRef())) {
			// we already have this pose, don't add it
			watch_node(model->NodeRef(), B_STOP_WATCHING, this);
			delete model;
			if (resultingPoses)
				resultingPoses[modelIndex] = NULL;
			continue;
		}

The FindPose search being o(n) this hurts performance a lot. Searching the ZombieList should be ok as it should be quite small if not empty most of the time.

Now i've reviewed a lot the code and i can't imagine a case where a Model could be added two times... I added debug output here and it never happens with my tests. Was this a mistake or a paranoid check? Ideas welcomed :)

comment:14 by aldeck, 16 years ago

Note that the search use a comparison of the entry_ref of the models so we're looking for a case where a Pose with an equal entry_ref already exists.

in reply to:  13 comment:15 by stippi, 16 years ago

Description: modified (diff)

Replying to aldeck:

For reference, the AddMimeType issue has been fixed in hrev28891 .

Now i'd like to address the 'FindPose' issue, see this check before each Pose creation:

line 1610 in PoseView.cpp :

if (FindPose(model) || FindZombie(model->NodeRef())) {
			// we already have this pose, don't add it
			watch_node(model->NodeRef(), B_STOP_WATCHING, this);
			delete model;
			if (resultingPoses)
				resultingPoses[modelIndex] = NULL;
			continue;
		}

The FindPose search being o(n) this hurts performance a lot. Searching the ZombieList should be ok as it should be quite small if not empty most of the time.

Now i've reviewed a lot the code and i can't imagine a case where a Model could be added two times... I added debug output here and it never happens with my tests. Was this a mistake or a paranoid check? Ideas welcomed :)

comment:16 by aldeck, 16 years ago

Description: modified (diff)

It looks like you edited the decription by mistake :) Let me quote your answer here:

stippi:

It looks to me like a paranoid check, but maybe it prevents a corner case because of some stuff

happening asynchronously. I can't really tell. Maybe turn it into an assert which is not compiled

in in release mode?

Yes, i could do that. Otherwise if we really need that, it's always possible to do a check for dupes on the whole poseList, only one time, after adding all the poses. Using a temporary hash_set of entry_refs and checking if insertions succeeds. This should be much faster.

comment:17 by anevilyak, 16 years ago

Some more improvements made in hrev29198. Please check again.

comment:18 by anevilyak, 16 years ago

How is it with hrev29395 or later?

comment:19 by stippi, 16 years ago

It's much faster, but it's still slowing down noticeably after a while. I would leave the ticket open for the time being.

comment:20 by stippi, 16 years ago

Actually, once the files are in the cache, it's not too bad at all. It's night and day compared to ZETA.

in reply to:  19 comment:21 by anevilyak, 16 years ago

Replying to stippi:

It's much faster, but it's still slowing down noticeably after a while. I would leave the ticket open for the time being.

No worries, there are plenty more opportunities for optimization, will need to run the latest code in profile and see what the culprits are now, though I know for sure that Tracker's drawing code could use a cleanup/rewrite. You'd probably be the better candidate than me for that part though :)

In any case, could you try something for me?

http://dev.haiku-os.org/browser/haiku/trunk/src/kits/tracker/PoseView.cpp#L102 - change this to 50 and see if that makes any perceptible difference for you. We'll have to be careful changing that parameter too heavily though since it's somewhat of a balance between performance and responsiveness while populating (this controls how many entries the disk reader thread sends to the window thread per batch to populate, though it's also enforced by a time constraint (see http://dev.haiku-os.org/browser/haiku/trunk/src/kits/tracker/PoseView.cpp#L1365 ). Let me know please, that parameter could probably be tweaked a bit for more modern systems with faster disk/mem subsystems.

comment:22 by aldeck, 16 years ago

Milestone: UnscheduledR1
Owner: changed from aldeck to anevilyak
Status: assignednew

Reassigning to you Rene as you seem more motivated than me on this one atm :) Don't forget to bench your progresses, it's always interesting.

comment:23 by anevilyak, 15 years ago

Owner: changed from anevilyak to aldeck

Back to you since your spatial cache rewrites will probably impact this issue :)

comment:24 by waddlesplash, 10 years ago

Milestone: R1Unscheduled

Moving Tracker enhancement tickets out of R1 milestone -- Tracker's source code comes from BeOS R5, so it already has all the features it did on R5.

comment:25 by diver, 10 years ago

Blocking: 3974 added

(In #3974) Most likely a dupe of #3011.

comment:26 by diver, 6 years ago

Blocking: 10952 added

comment:27 by leavengood, 5 years ago

Owner: changed from aldeck to leavengood
Status: newassigned

When I finish what I want to fix in media related parts of Haiku, I will look at this.

comment:28 by leavengood, 5 years ago

I don't know, this feels pretty fast now, especially once files are in the file cache. I am testing this opening a directory with 8612 empty files that I made with touch. It takes about 8-10 seconds. I am testing Haiku in VirtualBox with 1 CPU on a host machine with an i7 2600K and Intel SSD from 2011. I don't know how much using a VM would taint the profile results though.

I ran the profile command against Tracker and on the first run the "add poses" thread (which actually is what reads from the disk) hits the EntryCache a lot, and then after that it is really just hitting the kernel to get stat info from the cache as well as some memory allocation.

When I close the resulting window and get the profile information there is really not that much in Tracker itself high on the profile list. See attached file. It is doing a whole lot of _kern_port_buffer_size_etc calls, which maybe is just communication with app_server? Or that is the information sent from the "add poses" thread? Either way I am not sure what else could be tweaked in Tracker.

by leavengood, 5 years ago

Profile info after closing a window with 8612 empty files

comment:29 by leavengood, 5 years ago

waddlesplash informed me of the -f option to profile and that gives much better info, see attached "better profile". This shows AddPoseToList being where most time is spent. This also has the profile for the "add poses" thread.

by leavengood, 5 years ago

Attachment: better_profile.txt added

comment:30 by leavengood, 5 years ago

I have figured out the main problem here. In AddPoseToList, some weird BRect calculation is done and there is an "if" that is always true:

// we only care about the positions if this is a visible list
poseBounds = CalcPoseRectList(pose, poseIndex);
havePoseBounds = true;

BRect srcRect(Extent());
srcRect.top = poseBounds.top;
srcRect = srcRect & viewBounds;
BRect destRect(srcRect);
destRect.OffsetBy(0, fListElemHeight);

// special case the addition of a pose that scrolls
// the extent into the view for the first time:
if (destRect.bottom > viewBounds.top
	&& destRect.top > destRect.bottom) {
	// make destRect valid
	destRect.top = viewBounds.top;
}

if (srcRect.Intersects(viewBounds)
	|| destRect.Intersects(viewBounds)) {

At the moment I have no idea what srcRect and destRect are supposed to be, but they always seem to intersect with the view bounds, and this causes a synchronous draw for every pose added.

When I changed this if to "if (poseBounds.Intersects(viewBounds)) {" only the actual visible poses are drawn and my big directory of empty files loads in a second.

But now some bad news: this code must serve some purpose because with the above change I get some drawing artifacts if opening a directory scrolled down at all (Tracker remembers your scroll position.)

As has been noted many times PoseView is a big mess. I do not know if I want to do a major refactoring at the moment, but I can make things much better and faster if I can understand the purpose of this code and only trigger drawing when it is truly needed.

I'll keep studying it and try to understand how scrolling changes the behavior here.

I think I can write out what it should be doing and the corner cases and maybe use that as a template to make a new version of AddPoseToList.

I think a big part of the issue here is the contents of the directory are returned in an order that is much different to how Tracker wants to order things. So many times the position of the poses has to be shuffled. I suspect that is part of the issue here. That is also an issue during queries of course, as results are returned in whatever order BFS finds them in.

If anyone has other insights, let me know.

comment:31 by stippi, 5 years ago

It is a long time ago that I studied this code... but from what I remember, it will use CopyBits() to shift stuff down (and up?) to make room for new items. That is why it draws synchronously, so it knows it will not copy regions that haven't been drawn yet. This code may be from a time where drawing a file was much slower.

To be honest, your test highlights why this approach is probably misguided. As a user, I am not very interested in seeing files as soon as possible and then watch as the list is filled with more files, which are inserted possibly inbetween already found files due to the filter order. I would rather see a spinning graphics for the time it loads if that speeds up the whole process by a factor of 8.

in reply to:  31 comment:32 by jackburton, 5 years ago

Replying to stippi:

It is a long time ago that I studied this code... but from what I remember, it will use CopyBits() to shift stuff down (and up?) to make room for new items. That is why it draws synchronously, so it knows it will not copy regions that haven't been drawn yet. This code may be from a time where drawing a file was much slower.

To be honest, your test highlights why this approach is probably misguided. As a user, I am not very interested in seeing files as soon as possible and then watch as the list is filled with more files, which are inserted possibly inbetween already found files due to the filter order. I would rather see a spinning graphics for the time it loads if that speeds up the whole process by a factor of 8.

By the way, the CopyBits part also complicates the code, and makes harder to implement things like the alternate color list.

comment:33 by stippi, 5 years ago

Oh yes. It is probably better to just Invalidate() from the top of each new visible item to the bottom of the list. And let app_server bunch up the redraws.

comment:34 by leavengood, 5 years ago

Using CopyBits does seem like a needless optimization. Plus this particular bug is destroying any benefits that might be giving us, times 100 probably.

I do like how the files pop in rather than showing a loading animation, but obviously if I cannot find a way to fix this while keeping that I would also rather have things 8 times faster with an animation.

Removing the CopyBits might take me a while to figure out though. I want to avoid getting into a situation of having massive changes out in a branch which never gets merged like has happened repeatedly with Tracker. I will try my best to make small, incremental changes, and maybe I can remove the CopyBits for this case but not for others. Or maybe I don't have to change that at all for the moment to fix this bug. We shall see.

@jackburton: What is the alternate color list?

comment:35 by leavengood, 5 years ago

Rene explained the alternate color listing and yeah that would probably be nice too. For those who don't know Finder on macOS colors each odd and even row a different gray to make them easier to distinguish. As long as Tracker uses CopyBits to move drawn poses around and change order we can't really do that too.

comment:36 by leavengood, 5 years ago

Resolution: fixed
Status: assignedclosed

I am pretty satisfied with the speed I get in hrev53392. Let's go ahead and close this issue and if we have some other changes or optimizations, they don't need this ticket open.

If there is not a ticket about alternate colors I'll log it as an enhancement, but as noted that will require changing how PoseView draws.

Last edited 5 years ago by leavengood (previous) (diff)

comment:37 by stippi, 5 years ago

The improvements are very noticeable. Nice job!

comment:38 by nielx, 5 years ago

Milestone: UnscheduledR1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

Note: See TracTickets for help on using tickets.