text indexing on Linux?
William Park
opengeometry-FFYn/CNdgSA at public.gmane.org
Thu Jul 5 22:42:43 UTC 2012
Number of "files" can be millions, and "words" would come from everyday
English usage.
Even though my case is not file related, I posed the problem as such,
because the problem is essentially the same. In my case, records
contain
- item description, sku, price, etc.
- customer name, address, etc.
- vendor name, address, etc.
So, if I give a subset of above data, I want to get back the relevant
record keys.
If PostgreSQL is already employed, then that would be the answer. But,
if glibc has something similar, then I'd prefer that.
--
William
On Thu, Jul 05, 2012 at 12:38:55PM -0400, Ted wrote:
> are the contents basically completely random dictionary words, i.e. a
> set of "words" that can be from 600k+ words?
> Or is the contents a small subset of "words".
> Also , how many files are you talking about?
>
> -tl
>
> On 07/05/2012 12:31 PM, William Park wrote:
> >Hi all,
> >
> >Suppose all your files are text files and contain 10 words max. What
> >program would you use to index them based on contents? That is,
> >given a set of words, it has to return the name of files that contain
> >those words.
> >
> >I know of "updatedb" and "locate", but they index only filenames, not
> >the content. For my need, "grep" is still faster than any SQL
> >solution, but I'm curious as to what is the correct approach.
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list