Yet another story of troubleshooting an interesting case which lead to a weird finding which is kind of a non-documented behavior of either versions of outlook with Exchange 2010 multi role installation using DAG and CAS array together. Indeed, you cannot use DAG and WNLB together but there are several organizations using hardware load balancers to configure DAG on multiple server roles on a single server yet have DAG and CAS array.
In one of these unique cases that took an abnormally long time to reach a resolution (workaround), this behavior was a major culprit. Let me come to the point.
One of the customers have a multi-server role DAG and CAS array architecture. They have two servers EX-01 and EX-02 with Mailbox, HT and CAS server roles installed on them. These servers are also the members of the only DAG they have. These servers implement a CAS array load balanced with the help of a Barracuda 340 appliance.
Diagrammatically, it looks pretty simple,
Everything seems to be alright. DAG *overs, CAS load balancing, mail flow, etc. works absolutely fantastic; except a haunting random pop on outlook clients that says”
“The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook”
After troubleshooting this whole case for more than a month, turns out to be a really weird finding. I don’t know whether this is something different than outlook is supposed to handle or it can be a bug. Regardless the logic of Outlook to Exchange communication or it being a bug, it is surely interesting to know.
So here is what happens:
If you observe the above diagram carefully, only a single among two boxes in DAG + CAS array have the PF database store on them.
When Outlook clients connect to an Exchange 2010 server, they would connect directly to a mailbox server hosting PF replica. If outlook connects to a CAS array member that also has PF store hosted on it, it converges all connections and Public and Private logons as a single connection. When the client connects to a CAS array member which does not host a PF store, then Exchange issues a wrongServer response to the client and suggests a new server name for public logon. Somehow, outlook is unable to handle this response and thinks that it has to reconfigure the profile.
If you have ever been haunted by this kind of problem you can easily figure out this whole logic with the help of RCA logs on the CAS servers if you look at the RCA logs thoroughly, you will see something like below:
2013-04-04T13:53:07.585Z,17918,1,/o=Customer/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User E345f,,OUTLOOK.EXE,11.0.8200.0,Cached,,,ncacn_ip_tcp,,PublicLogon,1144 (rop::WrongServer),00:00:00,”Logon: Public, in database 36d89041-6f58-4bcb-a7af-fd38d9994b94 last mounted on EX-02.Customer at 04-04-2013 12:30:12, currently Mounted; Redirected: not a user’s home public server, suggested new server: /o=Customer/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=EX-02″,RopHandler: Logon:
I am sure I can’t call it a resolution but below are couple ways to handle this:
1. Block access to public folder store on the servers. – This might be impractical for a lot organizations since PFs are still used by a lot of companies for collaboration purposes.
2. Move the PF database to another server which is not a part of a CAS array.
3. Create one more replica of PF store on another member of the CAS array. (Note: Due to some situations I could not test this scenario in labs before publishing. I would suggest having a check in the labs before doing this in production)
EDIT: Creating additional replica of the PF databases does not really help. You should provision a copy of PF databases on a different server.