Best tools for making a spider?

Madison Kelly linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Wed Jan 3 22:37:02 UTC 2007


Alex Beamish wrote:
> On 1/3/07, Evan Leibovitch <evan-ieNeDk6JonTYtjvyW6yDsg at public.gmane.org> wrote:
>>
>> This isn't meant to start a flamewar, honestly.
>>
>> I'm just wondering if there are some languages that are either optimized
>> or otherwise more suitable than others for the specific task of writing
>> a specialized web spider.
>>
>> So far the people I've spoken to say this is a perfect task for Ruby,
>> but I don't know it at all. There is no previous baggage on this
>> particular project so we can choose any tool we want -- but once chosen
>> I'd like to stay with it. Having some existing open source templates or
>> existing code to build upon is always nice too.
>>
>> I'm not going to be the programmer (we do want working code, after all
>> :-) ) but I do have some say in the tech to be used. Any suggestions or
>> pointers of where to explore this particular question are appreciated.
> 
> 
> I would expect you could use Andy Lester's WWW::Mechanize for spidering. 
> See
> 
>  http://search.cpan.org/~petdance/WWW-Mechanize-1.20/lib/WWW/Mechanize.pm
> 
> for lots more information. That page also links to O'Reilly's *Spidering
> Hacks*
> 
>  http://www.oreilly.com/catalog/spiderhks/
> 
> which would probably be a useful resource no matter what language platform
> you choose.
> 

I'd throw a vote in for perl, too. *Lots* of modules to choose from and 
daeling with text is perl's forte.

Madi
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list