text indexing on Linux?

William Park opengeometry-FFYn/CNdgSA at public.gmane.org
Thu Jul 5 22:42:43 UTC 2012


Number of "files" can be millions, and "words" would come from everyday
English usage.

Even though my case is not file related, I posed the problem as such,
because the problem is essentially the same.  In my case, records
contain
    - item description, sku, price, etc.
    - customer name, address, etc.
    - vendor name, address, etc.
So, if I give a subset of above data, I want to get back the relevant
record keys.

If PostgreSQL is already employed, then that would be the answer.  But,
if glibc has something similar, then I'd prefer that.
-- 
William

On Thu, Jul 05, 2012 at 12:38:55PM -0400, Ted wrote:
> are the contents basically completely random dictionary words, i.e. a 
> set of "words" that can be from 600k+ words?
> Or is the contents a small subset of "words".
> Also , how many files are you talking about?
> 
> -tl
> 
> On 07/05/2012 12:31 PM, William Park wrote:
> >Hi all,
> >
> >Suppose all your files are text files and contain 10 words max.  What
> >program would you use to index them based on contents?  That is,
> >given a set of words, it has to return the name of files that contain
> >those words.
> >
> >I know of "updatedb" and "locate", but they index only filenames, not
> >the content.  For my need, "grep" is still faster than any SQL
> >solution, but I'm curious as to what is the correct approach.
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list