text indexing on Linux?

Thu Jul 5 16:45:48 UTC 2012

On Thu, Jul 5, 2012 at 12:31 PM, William Park <opengeometry-FFYn/CNdgSA at public.gmane.org> wrote:
> Hi all,
>
> Suppose all your files are text files and contain 10 words max.  What
> program would you use to index them based on contents?  That is, given a
> set of words, it has to return the name of files that contain those
> words.
>
> I know of "updatedb" and "locate", but they index only filenames, not
> the content.  For my need, "grep" is still faster than any SQL solution,
> but I'm curious as to what is the correct approach.

There are a number of "text database" systems that might be suitable
for this sort of thing.

Desktop environments have "gone here"...

See Beagle... <http://en.wikipedia.org/wiki/Beagle_(software)>,
unfortunately seems to have 'died.'

The KDE-ish flavour of this is Strigi:
   http://sourceforge.net/projects/strigi/

Strigi can use a number of backends for storing the indexes, including

 - Apache Lucerne <http://lucene.apache.org/>
 - Xapian <http://xapian.org/>

See also...
  http://fallabs.com/estraier/
  http://fallabs.com/hyperestraier/

It is common now for relational databases to include full text search
capabilities.
   http://www.postgresql.org/docs/9.1/static/textsearch.html
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists