I have a little story from an administrator’s life for you today.
Today at around 1pm CEST, our mail cluster started behaving very odd. Although SMTP ports were open, establishing a connection to the initial server greeting took around 30 seconds on each of the 8 nodes.
Long delays normally indicate DNS problems, so I checked if the resolvers were b0rked. Both were working fine and DNS lookups on the command line on each of the mail servers were blazing fast. Ho-humm... stracing the qmail-smtpd processes showed that my assumption was correct - the delays were definitely caused by DNS lookups to our resolverss being slow. Just to be on the safe side and to rule out firewall or routing issues between our networks, I used the OpenDNS resolvers on one node - nothing changed.
My next guess was: Maybe a RBL shut down?! We are only filtering against
XBL and the
iX blacklist and both were up - I tried anyway. Still, SMTP was slow on every mail server.
We are using qmail with rblsmtpd, so I could quickly disable the rblsmtpd service altogether. Suddenly, the connection speed was up to past standards again. So it had to do with rblsmtpd, but with none of the RBLs it is filtering against. Weird.
I went to the resolvers and checked which queries were actually performed during a smtp connection initiation - and to my astonishment, I saw that queries to resolve 1.2.3.4.rbl.maps.vix.com were used during each connection initiation. I googled a bit and read that the long defunct MAPS RBL was the fallback blacklist that was used by rblsmtpd when no other RBLs were specified in the configuration.
However, I had two blacklists in the config - yet rblsmtpd tried to look up at the MAPS RBL, too.
It seems someone disabled the nameservers for the vix.com zone, so I took the liberty of telling my resolvers to serve it to all clients - empty, of course. Now, everything is back to normal. It cost me about two hours to solve that problem.