extracting text from PDF file

fcsoft-3Emkkp+1Olsmp8TqCH86vg at public.gmane.org fcsoft-3Emkkp+1Olsmp8TqCH86vg at public.gmane.org
Wed Apr 21 17:35:39 UTC 2004


Thanks.   Tried this.     The resulting PS file looks OK but the final text
file 
is severely jumbled up without any detectable pattern visa vi the original 
PDF.

I'll keep trying.

Original Message:
-----------------
From: Henry Spencer henry-lqW1N6Cllo31P9xLtpHBDw at public.gmane.org.net
Date: Sat, 17 Apr 2004 12:43:26 -0400 (EDT)
To: tlug-GezYG1x/Qbs at public.gmane.org.org
Subject: Re: [TLUG]: extracting text from PDF file


On Sat, 17 Apr 2004, bob findlay wrote:
> I've stumbled across the pdftotext utility on my Red Hat box.
> When I run it fails saying that
>  "copying text from this document is not allowed".
> Am I SOL or are there any bright ideas?

It's probably not as good, but try pdftops followed by ps2ascii.  (The
ps2ascii manpage also has a reference to a pstotext program, although it
doesn't seem to exist on the Linux system I've got handy.)

                                                          Henry Spencer
                                                       henry-lqW1N6Cllo31P9xLtpHBDw at public.gmane.org.net

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml


--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .


--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list