Opened 3 months ago

Closed 4 weeks ago

Last modified 4 weeks ago

#19080 closed bug (fixed)

Query term order shouldn't matter, but does.

Reported by: humdinger Owned by: axeld
Priority: normal Milestone: R1/beta6
Component: File Systems/BFS Version: R1/beta5
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev57849, 64bit.

The order in which you enter the terms for a query should not matter, but apparently it does. Consider these two queries that should have the same result:

  1. query -a "(((MAIL:when>%-3 days%)&&(MAIL:subject=="*[cC][oO][mM][mM][iI][tT]*"))&&(BEOS:TYPE=="text/x-email"))"

B_OK

  1. query -a "(((MAIL:subject=="*[cC][oO][mM][mM][iI][tT]*")&&(MAIL:when>%-3 days%))&&(BEOS:TYPE=="text/x-email"))"

B_NOT_OK

While query 1 results in the correct few dozens of mails from the last 3 days, query 2 returns over 3,000 mails from since forever (2019 for me, as I don't have older mails on this computer).

Not good...

Attachments (2)

works.png (11.0 KB ) - added by humdinger 3 months ago.
B_OK
doesntwork.png (11.0 KB ) - added by humdinger 3 months ago.
B_NOT_OK

Download all attachments as: .zip

Change History (16)

by humdinger, 3 months ago

Attachment: works.png added

B_OK

by humdinger, 3 months ago

Attachment: doesntwork.png added

B_NOT_OK

comment:1 by waddlesplash, 3 months ago

Component: Kits/libtracker.soFile Systems/BFS
Owner: changed from nobody to axeld

comment:2 by waddlesplash, 3 months ago

Does this reproduce on something besides BFS with emails; perhaps on packagefs?

comment:3 by humdinger, 3 months ago

With "on packagefs" you mean like the "system" volume?

There isn't much with atributes around there...
I just searched for applications with a "sk" AND "b" in their name, then tried the other way around. That did work. As did searching for audio files on a BFS volume, querying for Artist && Album combinations.

Maybe it's somethingto do with the MAIL:when attribute not being of type "string" as the MAIL:subject?

Also curious, why does query -a "(last_modified>%-1 days%)" spit out every file, not just the ones modified since yesterday? Maybe worth another ticket...

comment:4 by waddlesplash, 3 months ago

It's possible that relative date-based queries are somehow broken and that's the problem here.

comment:5 by waddlesplash, 5 weeks ago

So, the handling of relative dates in queries happens in userland, not the kernel:

$ strace -e "open_query" query -a "(((MAIL:subject=="*[cC][oO][mM][mM][iI][tT]*")&&(MAIL:when>%-3 days%))&&(BEOS:TYPE=="text/x-email"))"
[   412] open_query(0x3, "(((MAIL:subject==*[cC][oO][mM][mM][iI][tT]*)&&(MAIL:when>1731906000))&&(BEOS:TYPE==text/x-email))", 0x61, 0x0, 0xffffffff, 0x0) = 0x3 (28 us)

The question just then remains as to why the comparison isn't working here, for whatever reason.

comment:6 by waddlesplash, 5 weeks ago

The last_modified problems were indeed a regression; I fixed those in hrev58355.

I don't really have emails here to test with; any chance you could come up with a few that trigger this problem and zip them up? (If you can get the problem to reproduce on RAMFS, so much the better. Just copy a few emails over to a RAMFS volume and then run the query on that volume only.)

I guess somewhere around here there was that "data demo package" which had a number of emails in it; perhaps I can see if there's a few in there which are suitable to test this with...

comment:7 by waddlesplash, 5 weeks ago

(Well, you'll have to create the relevant indexes on the RAMFS volume, too.)

comment:8 by waddlesplash, 5 weeks ago

Actually, I found another issue and fixed that in hrev58356. So please see if that outright fixes this bug.

I would also be interested to see the size of these indexes (lsindex can report that). I think the reason the queries behave differently when ordered differently is because the indexes are large, and so it winds up thinking that querying either index is basically the same, but that's not the case at all. We should alter the scoring method to be more precise, probably; but let's see if we can just get the bug fixed first.

comment:9 by humdinger, 5 weeks ago

Well done! Testing on hrev58356, the queries from above have the same results! \o/

This is the lsindex -l of the partition where I keep my mails:

            Text  11/18/2023 11:18 AM      6144 Audio:Album
            Text  11/18/2023 11:18 AM      5120 Audio:Artist
            Text  11/18/2023 08:53 AM      6144 BEOS:APP_SIG
            Text  11/18/2023 11:18 AM      4096 BEOS:LOCALE_LANGUAGE
            Text  11/18/2023 11:18 AM      4096 BEOS:LOCALE_SIGNATURE
            Text  05/27/2024 07:41 PM      2048 Calendar:ID
            Text  05/27/2024 07:41 PM      2048 Category:Name
            Text  05/27/2024 07:41 PM      2048 Event:Category
            Text  05/27/2024 07:41 PM      2048 Event:Description
          Int-32  05/27/2024 07:41 PM      2048 Event:End
            Text  05/27/2024 07:41 PM      2048 Event:Name
            Text  05/27/2024 07:41 PM      2048 Event:Place
          Int-32  05/27/2024 07:41 PM      2048 Event:Reminder
          Int-32  05/27/2024 07:41 PM      2048 Event:Start
            Text  05/27/2024 07:41 PM      2048 Event:Status
          Int-32  05/27/2024 07:41 PM      2048 Event:Updated
            Text  10/26/2024 05:25 PM      2048 Feed:name
            Text  10/26/2024 05:25 PM      2048 Feed:source
            Text  10/26/2024 05:25 PM      2048 Feed:status
            Text  11/18/2023 11:18 AM   1105920 MAIL:account
          Int-32  11/18/2023 11:18 AM   1103872 MAIL:account_id
            Text  11/18/2023 11:18 AM   1080320 MAIL:cc
            Text  11/18/2023 11:18 AM      2048 MAIL:chain
          Int-32  11/18/2023 11:18 AM      2048 MAIL:draft
            Text  11/18/2023 11:18 AM     18432 MAIL:flags
            Text  11/18/2023 11:18 AM   1866752 MAIL:from
            Text  11/18/2023 11:18 AM   1627136 MAIL:name
            Text  11/18/2023 11:18 AM      2048 MAIL:pending_chain
            Text  11/18/2023 11:18 AM     20480 MAIL:priority
          Int-32  11/18/2023 11:18 AM    453632 MAIL:read
            Text  11/18/2023 11:18 AM   5228544 MAIL:reply
            Text  11/18/2023 11:18 AM   1105920 MAIL:status
            Text  11/18/2023 11:18 AM  13512704 MAIL:subject
            Text  11/18/2023 11:18 AM  11706368 MAIL:thread
            Text  11/18/2023 11:18 AM   1644544 MAIL:to
          Int-32  11/18/2023 11:18 AM   6884352 MAIL:when
            Text  11/18/2023 11:18 AM      4096 META:address
            Text  11/18/2023 11:18 AM      3072 META:address2
            Text  11/18/2023 11:18 AM      3072 META:aim
            Text  11/18/2023 11:18 AM      3072 META:anniversary
            Text  11/18/2023 11:18 AM      3072 META:birthday
            Text  11/18/2023 11:18 AM      3072 META:cell
            Text  11/18/2023 11:18 AM      3072 META:children
            Text  11/18/2023 11:18 AM      4096 META:city
            Text  11/18/2023 11:18 AM      3072 META:company
            Text  11/18/2023 11:18 AM      3072 META:country
            Text  11/18/2023 11:18 AM      2048 META:county
            Text  11/18/2023 11:18 AM      4096 META:email
            Text  11/18/2023 11:18 AM      3072 META:email2
            Text  11/18/2023 11:18 AM      3072 META:email3
            Text  11/18/2023 11:18 AM      3072 META:email4
            Text  11/18/2023 11:18 AM      3072 META:fax
            Text  11/18/2023 11:18 AM      2048 META:firstname
            Text  11/18/2023 11:18 AM      4096 META:group
            Text  11/18/2023 11:18 AM      3072 META:hphone
            Text  11/18/2023 11:18 AM      3072 META:icq
            Text  11/18/2023 11:18 AM      3072 META:jabber
            Text  11/18/2023 11:18 AM      2048 META:lastname
            Text  11/18/2023 11:18 AM      3072 META:mphone
            Text  11/18/2023 11:18 AM      2048 META:name
            Text  11/18/2023 11:18 AM      3072 META:nickname
            Text  11/18/2023 11:18 AM      3072 META:pager
            Text  11/18/2023 11:18 AM      3072 META:spouse
            Text  11/18/2023 11:18 AM      3072 META:state
            Text  11/18/2023 11:18 AM     46080 META:title
            Text  11/18/2023 11:18 AM    121856 META:url
            Text  11/18/2023 11:18 AM      3072 META:url2
            Text  11/18/2023 11:18 AM      3072 META:url3
            Text  11/18/2023 11:18 AM      3072 META:waddress
            Text  11/18/2023 11:18 AM      3072 META:waddress2
            Text  11/18/2023 11:18 AM      3072 META:wcity
            Text  11/18/2023 11:18 AM      3072 META:wcountry
            Text  11/18/2023 11:18 AM      3072 META:wcphone
            Text  11/18/2023 11:18 AM      3072 META:wfax
            Text  11/18/2023 11:18 AM      3072 META:wphone
            Text  11/18/2023 11:18 AM      3072 META:wstate
            Text  11/18/2023 11:18 AM      3072 META:wzip
            Text  11/18/2023 11:18 AM      3072 META:yahoo
            Text  11/18/2023 11:18 AM      4096 META:zip
            Text  11/18/2023 11:18 AM      6144 Media:Genre
          Int-32  11/18/2023 11:18 AM      3072 Media:Rating
            Text  11/18/2023 11:18 AM      6144 Media:Title
          Int-32  11/18/2023 11:18 AM      7168 Media:Year
            Text  11/18/2023 11:18 AM      2048 _signature
            Text  11/18/2023 11:18 AM      2048 _status
          Int-32  11/18/2023 09:10 AM      2048 _trk/qrylastchange
          Int-32  11/18/2023 09:10 AM      3072 _trk/recentQuery
            Text  11/18/2023 11:18 AM      2048 be:deskbar_item_status
          Int-64  11/18/2023 08:53 AM   6213632 last_modified
            Text  11/18/2023 08:53 AM  33856512 name
          Int-64  11/18/2023 08:53 AM   8140800 size

comment:10 by waddlesplash, 4 weeks ago

I've posted https://review.haiku-os.org/c/haiku/+/8593 to address this.

Please test both with and without that patch, checking to see which query runs faster: the one with MAIL:subject first, or the one with MAIL:when first. Before the patch, the one with MAIL:when first should be faster (maybe even considerably faster); after the patch, they should be equivalent. (Note that the second run of either query will probably be faster than the first.)

comment:11 by humdinger, 4 weeks ago

Thanks for working on this!

Here are my findings. I tried with these 2 queries:

  1. query -a "(((MAIL:when>%-100 days%)&&(MAIL:subject=="*[cC][oO][mM][mM][iI][tT]*"))&&(BEOS:TYPE=="text/x-email"))"
  2. query -a "(((MAIL:subject=="*[cC][oO][mM][mM][iI][tT]*")&&(MAIL:when>%-100 days%))&&(BEOS:TYPE=="text/x-email"))"

Both return 863 mails. I first ran on a current nightly, 3x the query-1, 3x query-2 with a reboot between each run to avoid caching.

The results:

hrev58356, 64bit, not patched:

1:			1:			1:
real    0m0,728s	real    0m0,769s	real    0m0,720s
user    0m0,052s	user    0m0,059s	user    0m0,056s
sys     0m0,221s	sys     0m0,248s	sys     0m0,217s

2:			2:			2:
real    0m2,095s	real    0m2,031s	real    0m1,980s
user    0m0,059s	user    0m0,056s	user    0m0,050s
sys     0m0,778s	sys     0m0,752s	sys     0m0,713s

hrev58363+1, 64bit, patched:

1:			1:			1:
real    0m0,689s	real    0m0,687s	real    0m0,733s
user    0m0,049s	user    0m0,055s	user    0m0,062s
sys     0m0,188s	sys     0m0,197s	sys     0m0,216s

2:			2:			2:
real    0m0,719s	real    0m0,668s	real    0m0,681s
user    0m0,054s	user    0m0,045s	user    0m0,053s
sys     0m0,217s	sys     0m0,184s	sys     0m0,193s

comment:12 by waddlesplash, 4 weeks ago

An even more significant result than I was expecting. Thanks for testing!

comment:13 by waddlesplash, 4 weeks ago

Milestone: UnscheduledR1/beta6
Resolution: fixed
Status: newclosed

Fix merged in hrev58365.

comment:14 by humdinger, 4 weeks ago

A very effective optimization, thank you!

Note: See TracTickets for help on using tickets.