getting rid of paper files

Christopher Browne cbbrowne-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Sat Feb 27 23:33:11 UTC 2010


On Sat, Feb 27, 2010 at 5:35 PM, D. Hugh Redelmeier <hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org> wrote:
> We have boxes and boxes of paper files -- various kinds of records.  These
> records continue to flow in.  They are not uniform: there are many kinds
> of records.

There are a few pieces to this; I'm not sure how well things tend to
tie together yet for the "massive" cases...

1.  You need suitable hardware.

The SANE project is the relevant one...   http://www.sane-project.org

There are a LOT of possibly suitable scanners

2.  Mind you, it's possible that it's worth spending a tad more and
getting an appliance that doesn't need a computer :-).  Canada
Computers carries a number of Canon "imageFormula" units which can
dump their output across a network connection via FTP/Email/SMB.
They're pretty expensive, starting at about $2k, but if you want to
scan a bunch of pages per minute (the unit I took a peek at does 26!),
that might be a good answer.

3.  You then need to collect the archives in a meaningful way to
enable searching them.

MacOS-X actually has a mighty slick feature at the filesystem level
which applications have been known to use; you can attach label
metadata to files (rather like OS/2 extended attributes), and there
are applications (Yep, Leap are ones I have used) which use those
labels as an organizing mechanism.

Thus, you might mark a bank statement with the labels:
- ScotiaBank
- 2010
- February
- Bank
- Statement

And then be able to classify based on as few or many of those labels
as you like.

In the absence of "extended attributes," one might do similar things
such as storing hashes of documents and their locations in a DBMS, and
associating labels there.  Further slickness would involve running OCR
software against the documents, and capturing the text to enable
searching by content.

There's a product called Documentum; costs thousands of dollars, and
chews up big servers for folks like Ontario Hydro who need long term
storage of their documents.  There's a model there for capturing
documents via lpd: you throw the document + metadata into an lpd
queue, and Documentum "eats" documents, stowing the results into its
repository.

Alas, free software options haven't emerged that integrate all of this
together in visibly convenient ways.  That first talk at OGLF we saw
by the "sales historian" was describing something that might be a bit
akin to Documentum, but the speaker wasn't remotely technical enough
to be able to tell :-).
-- 
http://linuxfinances.info/info/linuxdistributions.html
Samuel Goldwyn  - "I'm willing to admit that I may not always be
right, but I am never wrong." -
http://www.brainyquote.com/quotes/authors/s/samuel_goldwyn.html
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list