extracting text from PDF file
fcsoft-3Emkkp+1Olsmp8TqCH86vg at public.gmane.org
fcsoft-3Emkkp+1Olsmp8TqCH86vg at public.gmane.org
Wed Apr 21 17:35:39 UTC 2004
Thanks. Tried this. The resulting PS file looks OK but the final text
file
is severely jumbled up without any detectable pattern visa vi the original
PDF.
I'll keep trying.
Original Message:
-----------------
From: Henry Spencer henry-lqW1N6Cllo31P9xLtpHBDw at public.gmane.org.net
Date: Sat, 17 Apr 2004 12:43:26 -0400 (EDT)
To: tlug-GezYG1x/Qbs at public.gmane.org.org
Subject: Re: [TLUG]: extracting text from PDF file
On Sat, 17 Apr 2004, bob findlay wrote:
> I've stumbled across the pdftotext utility on my Red Hat box.
> When I run it fails saying that
> "copying text from this document is not allowed".
> Am I SOL or are there any bright ideas?
It's probably not as good, but try pdftops followed by ps2ascii. (The
ps2ascii manpage also has a reference to a pstotext program, although it
doesn't seem to exist on the Linux system I've got handy.)
Henry Spencer
henry-lqW1N6Cllo31P9xLtpHBDw at public.gmane.org.net
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list