Opened 11 years ago

Closed 6 years ago

Last modified 18 months ago

#9529 closed enhancement (fixed)

Use native text searching instead of grep

Reported by: X512 Owned by: phoudoin
Priority: normal Milestone: Unscheduled
Component: Applications/TextSearch Version: R1/Development
Keywords: Cc: phoudoin
Blocked By: Blocking:
Platform: All

Description

Currently TextSearch work very slow because starting and quitting grep team is a big overhead. It will be good to have native search engine. For example Pe multi-file search is mush faster than build-in Haiku app TextSearch. Pe can find text in Haiku sources in a reasonable time.

Change History (17)

comment:1 by leavengood, 11 years ago

While this is probably a pretty reasonable enhancement request, I would like to recommend the tool "ack" for searching through source code:

http://betterthangrep.com/

The standalone version is very easy to install into Haiku by just downloading and copying it to ~/config/bin, then making it executable:

http://betterthangrep.com/ack-standalone

In fact another way to improve TextSearch would just be to use ack, or have it as an option.

comment:2 by axeld, 11 years ago

Given the reasoning for a native solution, it doesn't make sense to use ack at all.

comment:3 by leavengood, 11 years ago

It depends on why grep is constantly stopped and started.

Taking a very quick look at the code for TextSearch it calls grep individually for every file, in which case I can see why it is so slow. Since barely any of grep's features are used it does seem stupid to use it this way, when a native solution with PCRE for regular expressions would likely be much faster.

But if the usage of grep or ack could be made more intelligent (such as taking advantage of the recursion options) and if popen was used instead of redirecting to a file to get the results(!!!), I bet it could be made much faster without having to recreate grep natively.

Of course if Pe already includes a multi-file, regular expression supporting search, then maybe that could be extracted into a small system library which could be used by various things, including TextSearch. But you know how I hate recreating code and repeating things.

comment:4 by phoudoin, 11 years ago

Cc: phoudoin added

comment:5 by waddlesplash, 9 years ago

Milestone: R1Unscheduled
Owner: changed from stippi to waddlesplash
Status: newassigned

comment:6 by waddlesplash, 9 years ago

Resolution: fixed
Status: assignedclosed

Implemented in hrev48969.

comment:7 by waddlesplash, 9 years ago

Resolution: fixed
Status: closedreopened

Commit reverted in hrev48971.

comment:8 by bbjimmy, 9 years ago

This was reverted " As per discussion on the ML."

What mailing list would that be? I haven't seen any discussion on any mailing list.

in reply to:  8 comment:9 by humdinger, 9 years ago

Replying to bbjimmy:

This was reverted " As per discussion on the ML."

What mailing list would that be? I haven't seen any discussion on any mailing list.

The commits mailinglist (thread) that was discussing the commit.

comment:10 by pulkomandy, 9 years ago

For the record, the problems with the reverted code:

  • It was reading the whole file to memory, which could fail for big files
  • It did not support using regular expressions for searching, only plain text search

The better way to fix this is to run grep once for all the files, instead of once for each file. Or, if we really want the tool to run without grep, it needs to work in a similar way to it: allow regexp, not read the whole file to memory but one line at a time. This is, I think, more work than needed, unless an existing library can do the work?

comment:11 by X512, 9 years ago

What solution is used in Pe? Can it be moved in separate library and reused by both Pe and TextSearch?

comment:12 by phoudoin, 8 years ago

Well, according to:

https://github.com/olta/pe/blob/4bfabe000ec381f00072a858b6012cc36cd27678/Sources/CFindDialog.cpp#L1591

  • Pe FindInFiles read the whole file to memory, too :-\
  • but it does support regular expressions.

Back to square one.

comment:13 by waddlesplash, 8 years ago

Owner: changed from waddlesplash to nobody
Status: reopenedassigned

deassigning various things from me

comment:14 by phoudoin, 7 years ago

Maybe an intermediate solution could be used here: TextSearch could be piping the list of null-terminated file names strings to search to a xargs --null grep SEARCH_PATTERN command?

That would leverage grep power without having to rewrite it in TextSearch... We could even parallelize the search that way:

xargs --null --max-procs=NB_CPU grep -n SEARCH_PATTERN

Last edited 18 months ago by phoudoin (previous) (diff)

comment:15 by phoudoin, 6 years ago

Owner: changed from nobody to phoudoin
Status: assignedin-progress

comment:16 by phoudoin, 6 years ago

Implemented piping filenames to xargs + grep in hrev51525

Searching "houdoin" case insensitive, plain text on Haiku's src root folder:

Previous implementation: 11'48s (708 seconds) Same machine, new implementation: 14s.

Version 2, edited 6 years ago by phoudoin (previous) (next) (diff)

comment:17 by phoudoin, 6 years ago

Resolution: fixed
Status: in-progressclosed
Note: See TracTickets for help on using tickets.