search engine pollution

Emma Jane Hogbin emmajane-MHIYrZpDPrNWk0Htik3J/w at public.gmane.org
Fri Apr 23 18:14:40 UTC 2004


On Wed, Apr 21, 2004 at 12:19:26AM +0300, Peter L. Peres wrote:
> I have htdig installed to search documents on my own machine and I have
> noticed that a particular document may be hard to find because it seldomly
> refers to itself, whereas many others refer to it. F.ex. searching for rfc
> ftp one will find all the assigned numbers and protocol lists, with the
> real ftp protocols scoring and ranking low (in the first few tens over
> 700+ matches in this case). What would be a way to improve this without
> using META tags and such (not all documents are text). Using the full
> title also does not help. The referrers also use the full title ...

Assuming the pages are marked up with some kind of semantic markup
language, you can adjust the rankings of the headings and titles of a
document. You really need to read the documentation that goes with
ht://dig. http://www.htdig.org/confindex.html Specifically:
	title_factor: http://www.htdig.org/attrs.html#title_factor
	heading_factor: http://www.htdig.org/attrs.html#heading_factor

emma

-- 
Emma Jane Hogbin
[[ 416 417 2868 ][ www.xtrinsic.com ]]
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list