Weekend project: Dovecot+Solr for searchable email

Jamon Camisso jamon.camisso-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Sat Feb 8 22:12:15 UTC 2014


So I've decided to give Dovecot and FTS (full text search) a try and am
very pleased with the results thus far. I'm using their Solr plugin for
search.

While I hate Java as a rule, Solr+Lucene is the right tool since it
gives granular control over how to handle synonyms, stop words, short
search terms etc. and works on all but the most extreme sized datasets.

>From Thunderbird or Mutt, if I search in my TLUG maildir for 'jmagick'
the results follow:

38289 [qtp1400743924-19] INFO  org.apache.solr.core.SolrCore  –
[collection1] webapp=/solr path=/selectparams={sort=uid+asc&
fl=uid,score&q=from:"jmagick"+OR+to:"jmagick"+OR
+cc:"jmagick"+OR+subject:"jmagick"+OR+body:"jmagick"
&fq=%2Bbox:71536e013f96f652af4c000085c4674b+%2Buser:"me"&rows=2371}
hits=4 status=0 QTime=45

Note the 'hits=4' and QTime=45. I have 2000+ messages in Solr so far and
it took 45ms to find the four relevant ones, with no JVM meddling.

For anyone who wants to give it a whirl, visit
http://wiki2.dovecot.org/Plugins/FTS/Solr and follow the instructions.

You'll need Dovecot >= 2.1 I believe. I'm using Solr 4.3.1 since that's
what I have handy, but 4.6.x works as well.

Note: if you use Thunderbird, turn off 'Keep messages on this computer'
under Account Settings -> Synchronization and Storage. In mutt I use =b
to search message bodies, whereas I think the default is ~b

Cheers, Jamon
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list