Best tools for making a spider?
Madison Kelly
linux-5ZoueyuiTZhBDgjK7y7TUQ at public.gmane.org
Wed Jan 3 22:37:02 UTC 2007
Alex Beamish wrote:
> On 1/3/07, Evan Leibovitch <evan-ieNeDk6JonTYtjvyW6yDsg at public.gmane.org> wrote:
>>
>> This isn't meant to start a flamewar, honestly.
>>
>> I'm just wondering if there are some languages that are either optimized
>> or otherwise more suitable than others for the specific task of writing
>> a specialized web spider.
>>
>> So far the people I've spoken to say this is a perfect task for Ruby,
>> but I don't know it at all. There is no previous baggage on this
>> particular project so we can choose any tool we want -- but once chosen
>> I'd like to stay with it. Having some existing open source templates or
>> existing code to build upon is always nice too.
>>
>> I'm not going to be the programmer (we do want working code, after all
>> :-) ) but I do have some say in the tech to be used. Any suggestions or
>> pointers of where to explore this particular question are appreciated.
>
>
> I would expect you could use Andy Lester's WWW::Mechanize for spidering.
> See
>
> http://search.cpan.org/~petdance/WWW-Mechanize-1.20/lib/WWW/Mechanize.pm
>
> for lots more information. That page also links to O'Reilly's *Spidering
> Hacks*
>
> http://www.oreilly.com/catalog/spiderhks/
>
> which would probably be a useful resource no matter what language platform
> you choose.
>
I'd throw a vote in for perl, too. *Lots* of modules to choose from and
daeling with text is perl's forte.
Madi
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list