MySQL question - perhaps interesting?

Zbigniew Koziol zkoziol-Zd07PnzKK1IAvxtiuMwx3w at public.gmane.org
Sun Mar 13 01:02:49 UTC 2005


Stewart,

I should have say: tens of thousands of records (words). I mean most of 
the words that are used in English most frequentely.

Both, Lucene and TextSTAT are interesting and new for me (thanks for the 
info) but these are not related much to my project. I will use Gutenberg 
project to get words (a bit old words, but thats good enough). Of course 
I could better scan the web. Thats not a problem either.

My question is in which order should I populate MySQL database to get 
the fastest search for words: starting from the most frequentely used 
words or from least frequent words? This actually can be tested easiely 
and I will report the result. But perhaps exists a different approach to 
the problem?

Users of web page will enter some text into the form. Computer will 
search for transcription of every word entered. The PHP program I am 
using (translation by me from Visual Basic) is good for small number of 
words but at large amounts of text web server times out. So instead of 
calculating transcription every time, for every word, I wanted rather to 
create a database of words and their transciptions.

zb.



Stewart C. Russell wrote:
> Sounds like you're wanting to create a small text corpus there, 
> Zbigniew. You might be better off with a free text search engine like 
> Apache Lucene than MySQL.
> 
> Alternatively, there are concordancing programs that will do exactly 
> this, and show you the most common words, and their collocates (words 
> that appear before or after them). One I found, but haven't tried, is 
> TextSTAT 
> <http://www.niederlandistik.fu-berlin.de/textstat/software-en.html> -- 
> it's been a while since I did corpus linguistics seriously.
> 
>  Stewart
> -- 
> The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
> 


-- 
Zbigniew Koziol, SoftQuake^(tm) Open Source Business Solutions
Web Development, Linux, Web Mail Fax Voice Servers, Networking
Consultations, Innovative Technologies Tel/Fax: 1-416-530-2780
Toronto,  Canada,  http://www.softquake.ca,  info-lcEyp1+e+UdAFePFGvp55w at public.gmane.org
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list