MySQL question - perhaps interesting?
Zbigniew Koziol
zkoziol-Zd07PnzKK1IAvxtiuMwx3w at public.gmane.org
Sun Mar 13 01:02:49 UTC 2005
Stewart,
I should have say: tens of thousands of records (words). I mean most of
the words that are used in English most frequentely.
Both, Lucene and TextSTAT are interesting and new for me (thanks for the
info) but these are not related much to my project. I will use Gutenberg
project to get words (a bit old words, but thats good enough). Of course
I could better scan the web. Thats not a problem either.
My question is in which order should I populate MySQL database to get
the fastest search for words: starting from the most frequentely used
words or from least frequent words? This actually can be tested easiely
and I will report the result. But perhaps exists a different approach to
the problem?
Users of web page will enter some text into the form. Computer will
search for transcription of every word entered. The PHP program I am
using (translation by me from Visual Basic) is good for small number of
words but at large amounts of text web server times out. So instead of
calculating transcription every time, for every word, I wanted rather to
create a database of words and their transciptions.
zb.
Stewart C. Russell wrote:
> Sounds like you're wanting to create a small text corpus there,
> Zbigniew. You might be better off with a free text search engine like
> Apache Lucene than MySQL.
>
> Alternatively, there are concordancing programs that will do exactly
> this, and show you the most common words, and their collocates (words
> that appear before or after them). One I found, but haven't tried, is
> TextSTAT
> <http://www.niederlandistik.fu-berlin.de/textstat/software-en.html> --
> it's been a while since I did corpus linguistics seriously.
>
> Stewart
> --
> The Toronto Linux Users Group. Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
>
--
Zbigniew Koziol, SoftQuake^(tm) Open Source Business Solutions
Web Development, Linux, Web Mail Fax Voice Servers, Networking
Consultations, Innovative Technologies Tel/Fax: 1-416-530-2780
Toronto, Canada, http://www.softquake.ca, info-lcEyp1+e+UdAFePFGvp55w at public.gmane.org
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list