text indexing on Linux?
Christopher Browne
cbbrowne-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Thu Jul 5 16:45:48 UTC 2012
On Thu, Jul 5, 2012 at 12:31 PM, William Park <opengeometry-FFYn/CNdgSA at public.gmane.org> wrote:
> Hi all,
>
> Suppose all your files are text files and contain 10 words max. What
> program would you use to index them based on contents? That is, given a
> set of words, it has to return the name of files that contain those
> words.
>
> I know of "updatedb" and "locate", but they index only filenames, not
> the content. For my need, "grep" is still faster than any SQL solution,
> but I'm curious as to what is the correct approach.
There are a number of "text database" systems that might be suitable
for this sort of thing.
Desktop environments have "gone here"...
See Beagle... <http://en.wikipedia.org/wiki/Beagle_(software)>,
unfortunately seems to have 'died.'
The KDE-ish flavour of this is Strigi:
http://sourceforge.net/projects/strigi/
Strigi can use a number of backends for storing the indexes, including
- Apache Lucerne <http://lucene.apache.org/>
- Xapian <http://xapian.org/>
See also...
http://fallabs.com/estraier/
http://fallabs.com/hyperestraier/
It is common now for relational databases to include full text search
capabilities.
http://www.postgresql.org/docs/9.1/static/textsearch.html
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list