Best tools for making a spider?

Dave Cramer davec-zxk95TxsVYDyHADnj0MGvQC/G2K4zDHf at public.gmane.org
Wed Jan 3 23:16:22 UTC 2007


http://www.java-source.net/open-source/crawlers
as well as lucene

http://lucene.apache.org/

have a look at Nutch. It is a framework and uses lucene

Dave

On 3-Jan-07, at 5:56 PM, Scott Elcomb wrote:

> On 1/3/07, Madison Kelly <linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org> wrote:
>> Alex Beamish wrote:
>> > On 1/3/07, Evan Leibovitch <evan-ieNeDk6JonTYtjvyW6yDsg at public.gmane.org> wrote:
>> >>
>> >> This isn't meant to start a flamewar, honestly.
>> >>
>> >> I'm just wondering if there are some languages that are either  
>> optimized
>> >> or otherwise more suitable than others for the specific task of  
>> writing
>> >> a specialized web spider.
>>
>> I'd throw a vote in for perl, too. *Lots* of modules to choose  
>> from and
>> daeling with text is perl's forte.
>
> Having written a couple simple spiders, I'd agree - Perl has a long
> history in this regards and is both very capable and flexible.
>
> -- 
> Scott Elcomb
> http://atomos.sourceforge.net/
> http://search.cpan.org/~selcomb/SAL-3.03/
> http://psema4.googlepages.com/
>
> "They that can give up essential liberty to obtain a little temporary
> safety deserve neither liberty nor safety."
>
>  - Benjamin Franklin
>
> '"A lie can travel halfway around the world while the truth is putting
> on its shoes."
>
>  - Mark Twain
> --
> The Toronto Linux Users Group.      Meetings: http://gtalug.org/
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
>

--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list