the web as a database

Joseph Kubik josephkubik-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Mon Apr 25 04:05:33 UTC 2005


The new "local results" section of google is doing a pretty good job for me.
Maybe a wrapper to automate the process, if you are looking for lists
of specific establishments across towns???
-Joseph-

On 4/20/05, Zbigniew Koziol <zkoziol-Zd07PnzKK1IAvxtiuMwx3w at public.gmane.org> wrote:
> 
> Just an idea. May be someone will want to comment?
> 
> Wouldnt it be wonderful that after typing a complex SQL-like query to
> Google I get a precise response with listing of all web pages on the
> Internet that contain a relevant information that interests me?
> 
> The problem is that Google-like search engines index the content of HTML
> pages. And HTML was designed to hold information that is supposed to be
> displayed in the browser. It was not designed to categorise that
> information. The HTML meta-tags like keywords and description are merely
> an attempt only to ad some categorization but a poor attempt. Thats why
> good search engines do not treat their content very seriously.
> 
> Lets take as an example: for some reason I wanted to build a database of
> all physics laboratories in the world. I would like to know where are
> they located (country, state/province, exact address, what are the main
> subjects of their research, whom to contact there for information, what
> are the names of main researches, etc.) In principle all this
> information does exist already on the web. I do not need to explain
> however that it is extremally tudios and time-consuming task to find it
> out and categorise.
> 
> Hence, I am talking about a new sort of the web functionality. Where the
> data could be taken out off, reworked, and displayed in a different way.
> 
> Some may suggest the use of XML. Probably a good idea. I do not have
> however a general understanding of what is behind XML. In principle, I
> imagine, web sites could have a special file hosted on their server that
> would contain a detailed information about the content of these web
> sites, or at least about the company. Like in the example above. A sort
> of like now /robots.txt is used, or newsfeed.xml .
> 
> Is there no other way?
> 
> If somebody would be interested in working with me on introducing that
> sort of new technology - please write. I have already some poor ideas
> how to do that. But I still am very interested in hearing your comments.
> The subject seems to be original, with a huge possible impact on the web
> development. A sort of like creating a new standard.
> 
> zb.
> --
> Zbigniew Koziol, SoftQuake^(tm) Open Source Business Solutions
> Web Development, Linux, Web Mail Fax Voice Servers, Networking
> Consultations, Innovative Technologies Tel/Fax: 1-416-530-2780
> Toronto,  Canada,  http://www.softquake.ca,  info-lcEyp1+e+UdAFePFGvp55w at public.gmane.org
> 
> --
> The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
>
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list