extracting text from PDF file

Sergey Semenyuk serge_ss-rieW9WUcm8FFJ04o6PK0Fg at public.gmane.org
Thu Apr 22 01:39:26 UTC 2004


I am not trying to invent a bike, but I've seen to many documents that were
either scanned and then converted to PDF (many people still create PDF's
from JPEGs) or were created using print "text as graphics". Copying and
other text functions are restrictions of commercial software and for the
sake of documents being commercial.

Sergey

-----Original Message-----
From: owner-tlug-lxSQFCZeNF4 at public.gmane.org [mailto:owner-tlug-lxSQFCZeNF4 at public.gmane.org] On Behalf Of Mark Borg
Sent: Wednesday, April 21, 2004 9:40 PM
To: tlug-lxSQFCZeNF4 at public.gmane.org
Subject: Re: [TLUG]: extracting text from PDF file

On Wed April 21 2004 13:59, Noah John Gellner wrote:
> Are you able to disclose the document that you are trying to convert? I
> wonder if Redhat hasn't disabled some functionality in pdftotext. I use
> the app all the time to view pdf attachments on my console. To test
> again, I downloaded a random pdf document and it worked very well.
>
> I would be interested to see if there are documents that are internally
> limited or if this is something imposed by Redhat.
>
> Noah
>
> On 13:35 Wed 21 Apr     , fcsoft-3Emkkp+1Olsmp8TqCH86vg at public.gmane.org wrote:
> > > When I run it fails saying that
> > >  "copying text from this document is not allowed".
> > > Am I SOL or are there any bright ideas?
hi, is this an encrypted pdf or a document created with a late version of 
acrobat?
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list