[Dovecot] Re: Bug? 1.0.0-test28 NFS locking problems

Tim Southerwood ts at doc.ic.ac.uk
Sat Jul 24 03:25:15 EEST 2004


On Fri, 23 Jul 2004 21:41:36 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:

> Tim Southerwood <ts at doc.ic.ac.uk> writes:
> 
> 
> This would constitute a violation of responsibilities or layers.  What
> you are suggesting appears as though you would want the software to
> work around a problem when the administrator teams for Solaris and
> Linux aren't talking to each other.
> 

Well, yes, upto a point. We're OK *potentially*, because I'm one of two
linux admins and the solaris guy sits behind me (but on leave right
now).

But I've worked in places before where a small department might run a
gateway server (eg IMAP) but have to deal with a centrally provided
filestore on some uber-SAN provided by a totally different department
(I'm citing universities here) - and worse, getting the central IT
people to touch "their" fileserver can be practically impossible in any
sensible timeframe.

It's a quite grim, but nonetheless, true reality that some people who
may want to run dovecot are stuck in such a situation. Service level
aggreements signed by upper tier managers usually don't include
"fcntl/F_SETLKW must work". I speak from experience.

> > Dovecot uses this locking in two places that matter so it's not IMHO
> > a terrible disaster to add a small workaround.
> 
> I've often been tempted to add some special case to software and
> ultimately given in, only to find out weeks, months or even years
> later that the special case handler wasn't working properly -- such
> seldomly used code is a maintenance nightmare. And these special cases
> need rather lengthy comments because a few months later the maintainer
> will see the code and throw it out because it looks extraneous.

Yes, I understand - I've seen the code to GNU/tar! (except for the
"throwing out" bit - that program is *the* museum of cruft) I totally
agree on the desire to keep stuff clean. Finding the balance is usually
a matter of debate though.

> Maybe such a workaround should be kept as a separate patch and not
> become part of the baseline code.

That would be a wise and perfectly helpful way to proceed if that is
what you would prefer. Put a patch on the ftp site and note it in a FAQ
along the lines: "so you've got a broken NFS server" or something.

Incidently, talking to another colleague, it seems that we also had this
problem with sunsite.org.uk (which we operate) - that was
exhibiting the same problem with F_SETLKW with one ftpd program running
over the NFS share between it's four hosts (between solaris client and
server NFS). That problem was "fixed" in a hurry by using a different
ftpd.

Unfortunately, we don't usually have enough time to get to the bottom of
every odd fault we get, but this time I'm being more tenacious because
this is irritating me (Solaris, not dovecot).

I'm on leave for 1.5 weeks - but I'll mail through to our solaris chap
and let him have a look at it.

Anyway - I still stand by my point of view that not all NFS
implementations are perfect, and doing something which helps people
around broken systems (which they may not control) is helpful - but in
the way that buggers up the dovecot codebase the least. 

I suppose that if 99.5% of dovecot's userbase don't have this issue,
perhaps you should leave the code alone. I will eventually do my own
patch, but I'll try and do it a right as possible and I'll mail it in
here if anyone else needs it.

Best wishes,

Tim

-- 
Tim Southerwood



More information about the dovecot mailing list