parsing HTML with awk or sed
Lennart Sorensen
lsorense-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys at public.gmane.org
Wed Feb 25 15:03:55 UTC 2009
On Tue, Feb 24, 2009 at 10:32:21PM -0500, Giles Orr wrote:
> I'd like to extract the contents of paragraph tags (<p>) from an HTML
> file. Don't want anything else, just that - the P tags and what's
> inside them, all other tags and contents not printed. Unfortunately,
> some are single line:
>
> <p>data</p>
>
> and some are multi-line:
>
> <p>
> More data
>
> </p>
And some are:
<p>stuff
<p>other stuff
<p>yet more stuff
The <p> tag was not required to be closed. Kind of a pain isn't it?
--
Len Sorensen
--
The Toronto Linux Users Group. Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists
More information about the Legacy
mailing list