Opened 6 months ago

Last modified 6 months ago

#18650 new bug

[query-cli] it takes a lot of time with the initial search

Reported by: tzu_mi Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: File Systems/BFS Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Hi, I and some other user noticed that the query-cli takes a lot of time when the command is ran for the first time, but it become a lot quicker is the same search is repeated later, as if if it builds up a cache, I'm experiencing this on Walter hrev 57315 x86_64 and previous revs (I cannot remember for how long I'm experiencing this but it's quite a long time, never reported before, my fault)

~/Desktop> time query -af '((name=="**")&&(META:group=="**"))'
[omitted]
real    0m20.704s
user    0m0.054s
sys     0m2.914s
~/Desktop> time query -af '((name=="**")&&(META:group=="**"))'
[omitted]
real    0m0.761s
user    0m0.051s
sys     0m0.708s


time query -af '((name=="**")&&(BEOS:TYPE=="text/memo"))'
[omitted]
real    0m19.852s
user    0m0.051s
sys     0m10.166s
time query -af '((name=="**")&&(BEOS:TYPE=="text/memo"))'
[omitted]
real    0m1.251s
user    0m0.048s
sys     0m1.202s

Change History (5)

comment:1 by waddlesplash, 6 months ago

Component: - GeneralFile Systems/BFS
Owner: changed from nobody to axeld
Platform: x86-64All

IF you are running queries on unindexed attributes, or on an especially large filesystem, I think this is expected and there's not much to be done about it.

comment:2 by bipolar, 6 months ago

What's considered a large filesystem? The first query -a name=*whatever* is always pretty slow on a 5 GB partition for me. (I should probably script a dummy query on boot, just so it get ready for when I want to actually use it :-P).

Also... doesn't this has to do more with PackageFS than BFS? I mean... on BFS the indexes are "there" in the disk, but for activated packages those need to be computed/gathered on each boot, or I'm just mistaken on how that works ?

Edit: did some tests, I see that /boot/system only indexes: last_modified, size, name, and BEOS:APP_SIG.

So my slow queries are down to being either for non-indexed attributes, or the difference between using exact match vs "glob" queries (eg. name=test vs name=*test*), plus my slow hardware and number of files.

Edit 2: Some timing results on my system (sounds on par with what I get on my 32 bits beta4, 5 GB partition, with similar amount of small files).

Last edited 6 months ago by bipolar (previous) (diff)

in reply to:  1 comment:3 by tzu_mi, 6 months ago

Replying to waddlesplash:

IF you are running queries on unindexed attributes, or on an especially large filesystem, I think this is expected and there's not much to be done about it.

Not so big, 57.25 GiB, 4096 bytes/block, on a USB3 pendrive, it's obviously indexed

~/Desktop> lsindex /boot
Audio:Album
Audio:Artist
Audio:Track
BEOS:APP_SIG
BEOS:LOCALE_LANGUAGE
BEOS:LOCALE_SIGNATURE
Calendar:ID
Category:Name
Event:Category
Event:Description
Event:End
Event:Name
Event:Place
Event:Start
Event:Status
Event:Updated
MAIL:account
MAIL:account_id
MAIL:beam/identity
MAIL:beam/imap-uid
MAIL:cc
MAIL:chain
MAIL:draft
MAIL:flags
MAIL:from
MAIL:name
MAIL:pending_chain
MAIL:priority
MAIL:read
MAIL:reply
MAIL:status
MAIL:subject
MAIL:thread
MAIL:to
MAIL:when
MEMO:keyw
MEMO:title
META:address
META:city
META:company
META:country
META:county
META:email
META:fax
META:group
META:hphone
META:keyw
META:mphone
META:name
META:nickname
META:state
META:status
META:url
META:wphone
META:zip
Media:Genre
Media:Rating
Media:Title
Media:Year
_signature
_status
_trk/qrylastchange
_trk/recentQuery
be:deskbar_item_status
last_modified
name
size

~/Desktop> df
 Type      Total     Free      Flags   Device                   Mounted on
--------- --------- --------- ------- ------------------------ -----------------
bfs        57.3 GiB  40.2 GiB QAM-P-W      /dev/disk/usb/0/0/0 /boot  

comment:4 by tzu_mi, 6 months ago

even specifying only the /boot volume and with no result (I've deleted all the person files), the first search hangs for ~20 secs,the second search with the same command takes half a second.

~/Desktop> time query -v /boot -f '((name=="**")&&(META:group=="**"))'

real    0m21.497s
user    0m0.051s
sys     0m2.513s
~/Desktop> time query -v /boot -f '((name=="**")&&(META:group=="**"))'

real    0m0.428s
user    0m0.049s
sys     0m0.378s

comment:5 by madmax, 6 months ago

I guess there's no query optimization a la RDBMS? Every file has name, don't know how big and spread that index may get and whether those 20 seconds can't be improved. If you query for ((META:group=="**")&&(name=="**")) or leave name out, the time taken is quite different.

Note: See TracTickets for help on using tickets.