[Dovecot] (Single instance) attachment storage
tss at iki.fi
Mon Aug 23 21:12:39 EEST 2010
On Mon, 2010-07-19 at 17:29 +0100, William Blunn wrote:
> > 2) If base64 attachment is in a standardized form that can be 100%
> reliably converted back to its original form, it could be stored
> decoded and then encoded back to original on the fly.
This is now done: http://hg.dovecot.org/dovecot-2.0-sis/rev/3ef0ac874fd7
> Probably you would need to have a base64 matcher/decoder which is
> smarter than normal base64 decoders and checks to make sure that all
> lines (apart from the last) are encoded (a) canonically (e.g.. with no
> trailing whitespace), and (b) using the same number of cells per line.
Anything unexpected causes the attachment to be saved without decoding
> Some systems finish the base64 stream with a newline (which in a
> multipart manifests as a blank line between the base64 stream and the
> '--' of the MIME boundary), whereas some systems finish the base64
> stream at the end of final 4-byte cell (which in a multipart manifests
> as the '--' of the MIME boundary appearing on the line immediately
> following the base64 encoded data). Your encoding allows for arbitrary
> data between the objects, so you would have no problem store these two
> cases verbatim. But something to watch out for when storing.
I implemented this so that when end of base64 stream is encountered, it
allows max. 1024 bytes of data after it. That data is saved in the dbox
file instead of in the attachment file. So for example if the entire
message body is a base64 encoded attachment but then some MTA appends a
disclaimer after it, the attachment part is still saved to a separate
I added that "max 1024 bytes after" so that if there is some weird
virus/spam/whatever attachment that claims to be base64 but then
actually is mostly non-base64 data, it could take less space by saving
the entire part as attachment rather than only the base64 data decoded.
More information about the dovecot