(script) dis/assembling mbox email

William Park opengeometry-FFYn/CNdgSA at public.gmane.org
Sat Jun 12 02:45:57 UTC 2004


I posted this to <comp.unix.shell> and <comp.mail.misc>.  And, it may be
of interest to some of you...

Time to time, I need to 
    - extract main header/body from a MIME email, 
    - parse and extract multipart segments, recursively,
    - walk through the email tree, and edit/delete/add stuffs
    - regenerate new MIME email.
You can edit the file manually, but it's difficult to keep track of
where you are.

So, I wrote shell scripts (included below my signature):
    1. unmbox.sh -- to extract email components into directory tree
    2. mbox.sh -- to generate email from directory tree
You can "walk" through MIME email by simply "walking" through
directory tree.  Analogy is 'tar' file.  You extract files into
directory tree, and you create tarball from the directory tree.  Or, if
you are using Slackware, analogy is 'explodepkg' and 'makepkg'.


Usage are 
    unmbox.sh dir < email
      mbox.sh dir > email

'unmbox.sh' will extract email components into directory tree.  Header
and body will be saved respectively as 'header' and 'body' files.  If
it's MIME, then each multipart segment will be saved as 'xx[0-9][0-9]'
file, and it will in turn be decomposed recursively.  In reverse,
'mbox.sh' recursively walks the directory tree, and assembles email
components into mbox-format.

Strictly speaking, MIME boundary pattern consists of any number of 
    [ A-Za-z0-9'()+_,./:=?-]
not ending in space.  And, boundary line in the message body consists of
    \n--pattern\n
    \n--pattern--\n
where 'pattern' is the boundary pattern assigned from Content-Type:
header.

For the sake of sanity, 

1.  The script recognizes only
	boundary="..."
    as MIME boundary parameter, ie. it must be double-quoted and no
    spaces around '='.

2.  Only lines consisting of '--pattern' or '--pattern--' are recognized
    as boundary lines, because Formail puts blank line (if doesn't
    already exist) at the top and bottom of email body, undoing '\n'
    prefix/suffix anyways.

3.  '.' needs to be escaped for Sed and Grep, and '()+.?' needs to be
    escaped for Csplit and Egrep.


Use at your risk, and enjoy.
-- 
William Park, Open Geometry Consulting, <opengeometry-FFYn/CNdgSA at public.gmane.org>
No, I will not fix your computer!  I'll reformat your harddisk, though.


-----------------------------------------------------------------------

#! /bin/sh
# Usage: unmbox.sh dir < email

[ ! -d $1 ] && mkdir $1

cd $1
cat > input
formail -f -X '' < input > header	# no blank lines
formail -I '' < input > body		# blank lines at top/bottom

if grep -o "boundary=\"[ A-Za-z0-9'()+_,./:=?-]*[A-Za-z0-9'()+_,./:=?-]\"" header > boundary; then
    . boundary
    eboundary=`sed 's/[()+.?]/\\&/g' <<< "$boundary"`
    csplit body "/^--$eboundary/" '{*}'		# xx00, xx01, ...
    for i in xx??; do
	if head -1 $i | egrep "^--$eboundary\$" > /dev/null; then
	    sed '1d' $i | unmbox.sh $i.mbox 
	fi
    done
else
    rm boundary
fi

-----------------------------------------------------------------------

#! /bin/sh
# Usage: mbox.sh dir > email

cd $1
sed '/^$/ d' header	# NO blank lines in header

if [ -f boundary ]; then
    . boundary
    echo
    for i in xx??.mbox; do
	echo "--$boundary"
	mbox.sh $i
    done
    echo "--$boundary--"
    echo
else
    [ "`head -1 body`" ] && echo	# blank line at top
    cat body
    [ "`tail -1 body`" ] && echo	# blank line at bottom
    :		# dummy, so that return code is 0
fi

-----------------------------------------------------------------------
--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml





More information about the Legacy mailing list