View .DOC files with strings

Peter L. Peres plp-ysDPMY98cNQDDBjDh4tngg at public.gmane.org
Sat Sep 6 13:27:15 UTC 2003


I have just tested a new way of formatting .DOC files for quick viewing.
It uses gawk to double each '\n' and groff for formatting. Very simple.
Minor garbage results that can be removed by hand or ignored. Tables will
not survive this conversion.

The command line is:

<document.doc gawk '{printf("%s\n\n",$0);next;}'|groff -man -Tascii >out.txt

groff does a good job of formatting the document even without escaping
what needs to be escaped. -man is optional (you can add your own
translation to chapters etc using more gawk rules). This simple script
works for us-ascii character sets.

Peter
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list