extracting text from PDF file

Fred Nastos nastos-JAjqph6Yjy8fbXvGcxQkLSwD8/FfD2ys at public.gmane.org
Fri Apr 23 19:59:59 UTC 2004


On April 23, 2004 03:32 pm, Stewart C. Russell wrote:
> Fred Nastos wrote:
> > Does anyone have a good way to extract images from a PDF file?
>
> pdfimages, from the xpdf package: <http://www.foolabs.com/xpdf/>

I've tried pdfimages;  It doesn't work for the document I'm
interested in.  The document has some funny way (i.e
non-typical) way of including eps images.

> While I'm here, I might as well also mention pdftohtml
> <http://pdftohtml.sourceforge.net/>, which makes a fantastic job of
> converting PDF to HTML layouts. F'rinstance, I generated this

Thanks.  I just tried it, and it does work for some documents
(quite well), but not for the one I'm working with right now.
Guess I'll keep trying... or ask a Windows-friend to extract
them for me.  Thanks

> <http://www.peck.ca/grhcc/portland_agenda_html/index.html> from
> <http://www.peck.ca/grhcc/portland_agenda.pdf>.
>
>   Stewart
> --
> The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
> TLUG requests: Linux topics, No HTML, wrap text below 80 columns
> How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list