Opened 8 years ago

Closed 8 years ago

Last modified 6 years ago

#13254 closed bug (fixed)

Query on E-Mail Attributes Misses Files

Reported by: AGMS Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: File Systems/BFS Version: R1/Development
Keywords: query bfs index attribute Cc: agmsmith@…
Blocked By: Blocking:
Platform: All

Description

I tested with hrev50905 to see if my e-mail torture test was working yet, and found that queries or the index system still don't work properly, missing most files.

I unzipped my collection of 17107 test files & dirs with MAIL:to attributes on most of them to a freshly formatted disk volume (with mail indices already created on it). Then running query 'MAIL:to=*' | wc turned up only 202 of them (yes, current directory was on the new disk volume). On the other hand, 'name=* && MAIL:to=*' turns up 17109 (it added trash and another directory to the 17107). I guess I have some empty ones or maybe those are directories since 'name=* && MAIL:to=?*' yields 14230.

Anyway, the 'MAIL:to=*' by itself should be finding all the e-mails.

Let me know if you need the 10MB copy of the e-mail collection .zip file, which I want to keep private.

Change History (7)

comment:1 by axeld, 8 years ago

Having the mails would be helpful in order to reproduce the issue. You could try to truncate all mail fails; this should not affect the index creation/usage, and should be sufficient to reproduce it. You could still send that to me privately, of course.

comment:2 by AGMS, 8 years ago

The obfuscated e-mail archive has been sent to Axel via e-mail.

comment:3 by axeld, 8 years ago

Thanks! This test shows that MAIL:to=* will obviously also be true if there aren't any items:

$ find . -exec listattr {} \; | grep MAIL:to | wc -l
14230

I've started to look into it, and just in case I'm hit by a rock tomorrow: the problem is the maximum key length. We allow for 256 bytes, and the query's key value storage only allows for 256 bytes.

However, TreeIterator::Traverse() always wants to return null terminated strings. For a 256 byte key, we'd need 257 bytes for the key plus the null character. Since there is not enough storage, it will return B_BUFFER_OVERFLOW, and the query will stop.

Not sure yet if we limit the key size to 255 bytes instead (that seems to be what BeOS did), or just increase the query's key storage by one byte. While the former makes us more compatible with BeOS, the latter will not render currently existing indexes invalid. As always, opinions welcome.

comment:4 by AGMS, 8 years ago

I had wondered if overly long attributes would mess things up, but I thought it would be a string comparison with garbage after 256 bytes giving inconsistent results, not an error code.

I'd go with the BeOS way, 255 byte max string contents and always have a NUL at the end, but only for string type attributes (others like arrays of integer attributes would cut off at 256). Should reduce the chance for bugs and make other code simpler. Also insignificantly faster in terms of cache performance :-).

Enforce the limit when writing keys of type string. Could keep the old index data by always forcing byte [255] to zero upon reading a string attribute key. Though a full reindex would be best in case losing the last byte changes the sort order (not likely to happen).

comment:5 by axeld, 8 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev51082, thanks again for your mails. I'll now take offers for Alex mail meta data, in case any agency is interested ;-)

I'm now cutting strings to the buffer length in TreeIterator::Traverse(), so the existing entries won't matter. Other than that, I went with the BeOS way of 255 characters + null byte; it simply makes the most sense.

comment:6 by AGMS, 8 years ago

Great! Maybe now I can move my e-mail to Haiku from BeOS. Will have to test it...

comment:7 by AGMS, 6 years ago

Did some retesting, unzipping the archive to an empty drive. hrev52282 queries get all the files while BeOS queries miss a few hundred of the 14230 files that have MAIL:to attributes (BeOS listattr output confirms the 14230 count). So Haiku is better in yet another way!

Note: See TracTickets for help on using tickets.