Opened 6 years ago

Closed 22 months ago

#9529 closed enhancement (fixed)

Use native text searching instead of grep

Reported by: X512 Owned by: phoudoin
Priority: normal Milestone: Unscheduled
Component: Applications/TextSearch Version: R1/Development
Keywords: Cc: phoudoin
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Currently TextSearch work very slow because starting and quitting grep team is a big overhead. It will be good to have native search engine. For example Pe multi-file search is mush faster than build-in Haiku app TextSearch. Pe can find text in Haiku sources in a reasonable time.

Change History (17)

comment:1 Changed 6 years ago by leavengood

While this is probably a pretty reasonable enhancement request, I would like to recommend the tool "ack" for searching through source code:

http://betterthangrep.com/

The standalone version is very easy to install into Haiku by just downloading and copying it to ~/config/bin, then making it executable:

http://betterthangrep.com/ack-standalone

In fact another way to improve TextSearch would just be to use ack, or have it as an option.

comment:2 Changed 6 years ago by axeld

Given the reasoning for a native solution, it doesn't make sense to use ack at all.

comment:3 Changed 6 years ago by leavengood

It depends on why grep is constantly stopped and started.

Taking a very quick look at the code for TextSearch it calls grep individually for every file, in which case I can see why it is so slow. Since barely any of grep's features are used it does seem stupid to use it this way, when a native solution with PCRE for regular expressions would likely be much faster.

But if the usage of grep or ack could be made more intelligent (such as taking advantage of the recursion options) and if popen was used instead of redirecting to a file to get the results(!!!), I bet it could be made much faster without having to recreate grep natively.

Of course if Pe already includes a multi-file, regular expression supporting search, then maybe that could be extracted into a small system library which could be used by various things, including TextSearch. But you know how I hate recreating code and repeating things.

comment:4 Changed 6 years ago by phoudoin

Cc: phoudoin added

comment:5 Changed 4 years ago by waddlesplash

Milestone: R1Unscheduled
Owner: changed from stippi to waddlesplash
Status: newassigned

comment:6 Changed 4 years ago by waddlesplash

Resolution: fixed
Status: assignedclosed

Implemented in hrev48969.

comment:7 Changed 4 years ago by waddlesplash

Resolution: fixed
Status: closedreopened

Commit reverted in hrev48971.

comment:8 Changed 4 years ago by bbjimmy

This was reverted " As per discussion on the ML."

What mailing list would that be? I haven't seen any discussion on any mailing list.

comment:9 in reply to:  8 Changed 4 years ago by humdinger

Replying to bbjimmy:

This was reverted " As per discussion on the ML."

What mailing list would that be? I haven't seen any discussion on any mailing list.

The commits mailinglist (thread) that was discussing the commit.

comment:10 Changed 4 years ago by pulkomandy

For the record, the problems with the reverted code:

  • It was reading the whole file to memory, which could fail for big files
  • It did not support using regular expressions for searching, only plain text search

The better way to fix this is to run grep once for all the files, instead of once for each file. Or, if we really want the tool to run without grep, it needs to work in a similar way to it: allow regexp, not read the whole file to memory but one line at a time. This is, I think, more work than needed, unless an existing library can do the work?

comment:11 Changed 4 years ago by X512

What solution is used in Pe? Can it be moved in separate library and reused by both Pe and TextSearch?

comment:12 Changed 4 years ago by phoudoin

Well, according to:

https://github.com/olta/pe/blob/4bfabe000ec381f00072a858b6012cc36cd27678/Sources/CFindDialog.cpp#L1591

  • Pe FindInFiles read the whole file to memory, too :-\
  • but it does support regular expressions.

Back to square one.

comment:13 Changed 3 years ago by waddlesplash

Owner: changed from waddlesplash to nobody
Status: reopenedassigned

deassigning various things from me

comment:14 Changed 2 years ago by phoudoin

Maybe an intermediate solution could be used here: TextSearch could be piping the list of null-terminated file names strings to search to a xargs --null grep SEARCH_PATTERN command?

That would level grep power without having to rewrite it in TextSearch... We could even parallelize the search that way:

xargs --null --max-procs=NB_CPU grep -n SEARCH_PATTERN

Last edited 2 years ago by phoudoin (previous) (diff)

comment:15 Changed 22 months ago by phoudoin

Owner: changed from nobody to phoudoin
Status: assignedin-progress

comment:16 Changed 22 months ago by phoudoin

Implemented piping all filenames at once to xargs + grep in hrev51525

Searching "houdoin" case insensitive, plain text on Haiku's src root folder:

Implementation Duration
Previous:708s (11'48s)
Newer:14s

Should works for everything, not my name only :-)

Last edited 22 months ago by phoudoin (previous) (diff)

comment:17 Changed 22 months ago by phoudoin

Resolution: fixed
Status: in-progressclosed
Note: See TracTickets for help on using tickets.