parsing HTML with awk or sed

Lennart Sorensen lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Wed Feb 25 15:03:55 UTC 2009


On Tue, Feb 24, 2009 at 10:32:21PM -0500, Giles Orr wrote:
> I'd like to extract the contents of paragraph tags (<p>) from an HTML
> file.  Don't want anything else, just that - the P tags and what's
> inside them, all other tags and contents not printed.  Unfortunately,
> some are single line:
> 
> <p>data</p>
> 
> and some are multi-line:
> 
> <p>
> More data
> 
> </p>

And some are:
<p>stuff
<p>other stuff
<p>yet more stuff

The <p> tag was not required to be closed.  Kind of a pain isn't it?

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list