Spiders and crawlers

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Mon Apr 5 16:41:28 UTC 2010


On Thu, Apr 01, 2010 at 05:56:35PM -0400, Evan Leibovitch wrote:
> I'm looking to implement a spidering system intended to look through a bunch
> of catalog websites, in order to track changes to those catalogs (with the
> help of a backend MySQL system).

I always wonder: Why mysql?  Postgresql is an obviously better and more
scalable choice.  Why do so many people just barge ahead with mysql?

> The Wikipedia entry for "web crawler" returns a lot of interesting choices;
> I'm wondering is anyone here has experience in either writing one or using
> an existing open source one. I'm hoping for something that is reasonably
> configurable so that one doesn't need to know a language like C or Java to
> make minor config changes.
> 
> Any help is appreciated.

Well I haven't ever done that. :)

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list