Category Archives: Exchange 2010

Move Exchange 2010 Mailbox Database to Exchange 2013

This article is all about how to move Exchange 2010 mailbox database to exchange 2013. As we know, Microsoft Exchange Server is a mail server. It runs entirely on Windows Server operating systems. It was initially known as Microsoft’s internal mail server. It is licensed both in the forms of on-premises software and software as a service. It has limited features rather than Exchange 2013. Users might face an issue while sending or receiving the internal as well as external emails. To resolve this issue multiple users export mailbox from exchange 2010 to 2013. It contains multiple utilities such as executive audit logs, metadata searches, and unified message including menus, dial plans, and custom greetings. We will be discussing in detail about how to move exchange 2010 mailbox database to exchange 2013 and the solution to recover Exchange Mailbox. Continue reading Move Exchange 2010 Mailbox Database to Exchange 2013

Exchange Database is in Dirty Shutdown State Error – Learn to Fix

In this article, we are going to discuss exchange database is in dirty shutdown state. We will also discuss the solution to recover exchange from dirty shutdown which would be applicable for Exchange 2016, 2013, 2010 & 2007. As you might know that Exchange Server database is completely based on JET engine where log files are responsible for maintaining the track of input as well as output operations in the database file. It utilizes the concept of database cache mailbox to decrease the count of input and output operation. When the operation is loaded the cache memory then, it is not committed the information storage, the JET engine marks it as DIRTY. Until all the pending transactions are committed to the database then, it is not measured as updated. Moreover, until the time all the dirty pages are there in the database that is considered as inconsistent. Until the transaction is completed, if the machine shuts down accidentally then, database stay attached to the log file because of which “Exchange Database is in Dirty Shutdown State” error is received on screen. Continue reading Exchange Database is in Dirty Shutdown State Error – Learn to Fix

Heartbleed Vulnerability and Exchange Server

A member of a group on LinkedIn posted about a vulnerability in OpenSSL. Some more details are about the bug are documented here very well Critical crypto bug in OpenSSL opens two-thirds of the Web to eavesdropping

When you read OpenSSL, most of you might think it is not something applicable to my exchange environment since Microsoft Exchange doesn’t use OpenSSL anywhere. That may be not be 100% true. Although neither exchange nor windows natively use OpenSSL, this vulnerability still matters to you and needs you to look at if you are running any sort of hardware load balancer, reverse proxy appliance or a virtual appliance to publish exchange over internet or within corporate network.

A lot of load balancing appliances run on Linux based operating systems and do use OpenSSL stack extensively. You should do a double check on all the appliances that use Linux or Unix based operating systems on them.

How to detect Heartbleed

This vulnerability only applies to OpenSSL versions 1.0.1-1.0.1f. Other SSL libraries, such as PolarSSL, are not vulnerable. OpenVPN-NL, which is depending on PolarSSL, is not affected.

To detect whether you are affected by heartbleed you can use either of below tools

  • http://filippo.io/Heartbleed/ : a web based tool to test and identify the vulnerability. Just enter the name of the website you want to test, in exchange server’s case; it would be OWA/ EAS/EWS /OAB, etc URLs published on internet.
  • http://s3.jspenguin.org/ssltest.py : a python script to test for the vulnerability from the command line. If you want to scan multiple sites you can use a modified version with easily parseable output.
  • If you use Chrome you can install the Chromebleed checker that alerts you when visiting a vulnerable site.
  • To see whether your load balancing or reverse proxy appliance uses a vulnerable version of OpenSSL login to the appliance with and run openssl version if the version

Fix

OpenSSL has provided an updated version (1.0.1g) of OpenSSL at https://www.openssl.org/source/. It is recommended to consult your appliance manufacturer to find out the update procedure and implications of update before simply going ahead and applying the fix.

Exchange Type attribute

I have a habit of spending a lot of time to understand how exchange uses AD, Windows Registry, WMI, Crypto and all related stuff. One of my favorite things to do with any new version of exchange server is to look for the AD changes it makes. When Exchange 2010 was released I was trying to see through a lot of attributes and the way their values are constructed. All other attributes could be explained with the help of MSDN documentation or spending some time to create a logical link between the attributes, schema classes, etc. but the “type attribute on the exchange server object.

image

Value of “type” attribute looks something really weird. Initially I thought it was Chinese or Japanese but it is not. 😛

So what is this “Type” attribute on the exchange server object in active directory?

This attribute and the value of this attribute contains the licensing information of the server edition that you have chosen to install. When you install an Exchange Server 2010 role only Standard Edition of exchange gets installed automatically. Edition and licensing information is stored in type attribute in an encrypted form. Based on what key you have entered during the activation, exchange edition is determined and the value of this attribute also changes accordingly. Since it is in encrypted form, there is no specific pattern in the change that can be noted but you can still observe the change in the value of type attribute.

Well, that was just a geeky finding. Nothing useful anywhere in production although.

Remove-ActiveSyncDevice returns an error – Couldn’t find User as a recipient

Today’s blog post comes from another interesting find about Exchange Management Shell and removal of active sync devices. A lot of customers I know prefer to keep their active sync devices clean. If an employee does not use an active sync device more than few days, they simply remove it. Removing these devices periodically is indeed done through some or the other kind of automation techniques. A whole lot of people use powershell to do that.

At one of such customers, they were seeing errors while removing old active sync devices.

Issue

Running Remove-ActiveSyncDevice returns errors stating it Couldn’t find <user identity> as a recipient or The ActiveSyncDevice <DeviceIdentity> cannot be found. Both errors would look like below:

 

Couldn’t find ‘exchange.local/New Delhi/SomeLocaion/User1’ as a recipient.

    + CategoryInfo          : InvalidArgument: (:) [Remove-ActiveSyncDevice], RecipientNotFoundException

    + FullyQualifiedErrorId : 3DAABD9F,Microsoft.Exchange.Management.Tasks.RemoveMobileDevice

and

The ActiveSyncDevice exchange.local/New Delhi/SomeLocation/User1/ExchangeActiveSyncDevices/SAMSUNGGTI9100

§SAMSUNG1818901812 cannot be found.

    + CategoryInfo          : NotSpecified: (2:Int32) [Remove-ActiveSyncDevice], ManagementObjectNotFoundException

    + FullyQualifiedErrorId : 1C3255A8,Microsoft.Exchange.Management.Tasks.RemoveMobileDevice

Cause

Assume that you have created a mailbox named User1 in an OU exchange.local/New Delhi/SomeLocation. After creation of this mailbox the user was allowed to configure his active sync device. After successful activation, the user account stayed at that location for a while.

Due to some requirements or the change in user’s location or company, you move this user account to another OU using ADUC. While user account is moved, all subsequent objects of the user object in AD are also moved along.

When an active sync device activation process starts, exchange creates an active sync device object under user object in AD and this object also gets moved along the user account when a user account movement happens.

When you run Remove-ActiveSyncDevice using EMS, EMS looks for the object at two common places. The first place is the object entry in user’s mailbox as shown in below figure. ExchangeSyncData object in user’s mailbox (inside mailbox database) contains all the active and non active EAS devices the mailbox has ever synchronized with. In this example the device name is AirSync-SAMSUNGGTN7100-SEC160xxxxx

Capture1

The second place is in AD right under the user object associated with the mailbox. You can see this association using ADSIEDIT or LDP.exe

image

Like I said, when you move a user account to another OU, these EAS device objects also get moved along with it changing the identity of the object. However, when powershell queries this device it does not really query the device object in AD but in mailbox (Show in first figure) and tries to locate the device object in AD against the path it retrieved by querying the information received from object in mailbox. Since you have already moved the user object to a different location using ADUC, exchange is not really aware of what has happened and is unable to update this data back in respective user mailbox in database and returns those errors.

Workaround

Locate the EAS object under user account in AD and remove it using ADSIEDIT and remove an associated object in database by using MFCMAPI

Important

If a user has multiple devices partnered with his mailbox it can be very difficult to find out which one to delete. A way to find out a device object that is to be deleted, you can use following steps:

1. Run Get-ActiveSyncDevice –Mailbox “User1”

2. Make a note of Identity and LastSuccessSync for all the devices.

3. Open MFCMAPI and navigate to the screen shown in first figure.

4. Expand each device or appropriate device you identified in mailbox and select SyncStatus

You should see some properties like show below:

image

PR_LOCAL_COMMIT_TIME and PR_LAST_MOFICATION_TIME are two props which should help you determining which device to delete.

 

Note: These steps are not for someone who does not know how to use MFCMAPI and ADSIEDIT and that the only reason steps are outlined in very high level. If you have questions or need help, you can feel free to drop me a note.

Exchange 2010 Intermittent Password Prompts in Outlook Clients – NTLM Bottleneck

There are hundreds of articles on internet around this commonly seen issue. If you are running Exchange 2007 or later this issue occurs due to wrong certificate configuration most of the times. A wrong or missing name in certificate versus the URL defined on exchange web components like OWA, EAS, OA, OAB etc.

Exchange is a fairly complex code which runs along with or depends on several components like AD, Crypto, network components, authentication modules, etc.

This particular case I am writing about was more to do with the authentication mechanisms used by Exchange 2010. Exchange 2010 uses and supports several authentication mechanisms. Below diagram should help you understand a pretty simple looking setup that one of our customers were running:

 

image

The diagram is pretty self explanatory. It is a DAG and a CAS array with 4 domain controllers (although not all 4 are shown in diagram).

Even after verifying all certificate, url and authentication settings on OA, OWA, EAS, OAB, etc users still complained that they receive an annoying password which simply wont go away even after entering the correct user name and password.

Finally, we decided to look further into what is happening when the authentication requests is submitted to the CAS array and interestingly, we could correlate some event IDs in security log of  CAS servers which pointed towards the authentication issue. After investigating security logs carefully on the CAS server we found some entries relevant to a computer which reported a problem. The security log for this computer read as below:

Log Name: Security
Source: Microsoft-Windows-Security-Auditing
Date: 9/5/2013 10:22:59 PM
Event ID: 4625
Task Category: Logon
Level: Information
Keywords: Audit Failure
User: N/A
Computer: cas02.exchange.local
Description:
An account failed to log on.
Subject:
  Security ID: NULL SID
  Account Name: –
  Account Domain: –
  Logon ID: 0x0
Logon Type: 3
Account For Which Logon Failed:
  Security ID: NULL SID
  Account Name: username
  Account Domain: EXCHANGE
Failure Information:
  Failure Reason: An Error occurred during Logon.
  Status: 0xc000005e
  Sub Status: 0x0
Process Information:
  Caller Process ID: 0x0
  Caller Process Name: –
Network Information:
  Workstation Name:
  Source Network Address: 178.239.86.252
  Source Port: 37109
Detailed Authentication Information:
  Logon Process: NtLmSsp
  Authentication Package: NTLM
  Transited Services: –
  Package Name (NTLM only): –
  Key Length: 0

Initially it looked like an issue described in http://support.microsoft.com/kb/2157973/en-us but that was not the case since the error code described in KB and error above do not match. Also, there was no smart card logon used. To find out what the error code 0xc000005e meant, we used err.exe and the output was

C:\Tools\Err>Err.exe 0xc000005e
# for hex 0xc000005e / decimal -1073741730 :
  STATUS_NO_LOGON_SERVERS
# There are currently no logon servers available to service

Suspecting something wrong with NTLM netlogon.log was a potential subject to be looked at. Netlogon.log on client shows

Time [LOGON] SamLogon: Network logon of EXCHANGE\UserName from WorkstationName Returns 0xC000005E

It was again little misleading since the AD servers were up and running and processing the logon requests. There was no DNS issues identified either. A lot of googling and Binging, we reached out to a conclusion that lead us to think that something was wrong with the NTLM stuff. So what was it?

You may notice that NTLM bottlenecks can be caused due to RPC/HTTPS requests. RPC/HTTPS are definitely a key contributor to large NTLM requests since the session established using RPC/HTTPS has to be authenticated twice due to two different protocol payloads. Outer layer of HTTP requires the authentication once and the tunneled RPC requires another authentication to take place generating twice the load. Moreover, HTTP is a stateless protocol which can cause multiple authentication requests to be handled by the server.

Although RPC/HTTPS generates additional NTLM authentication requests; a direct MAPI connection to CAS / CAS array can also contribute to this if the traffic is too high. MAPI supports Kerberos authentication and the default setting in Outlook 2007 and later is to negotiate the strongest authentication available when not running in Outlook Anywhere mode. Unless kerberos support is configured in the environment, outlook will fall back on NTLM by default.

Considering all the factors and research done the only conclusion derived was to look for NTLM authentication related issues. A quick network packet capture on CAS servers help determining whether it is NTLM or something else.

To capture the precise results, leave the network capture running on the CAS server until a case of password prompt is reported. You will notice that the capture reveals something like below between the CAS server and client. (Running a simultaneous capture on client and servers both can help gathering precise results

0.0000000           11198    8:13:23 PM 9/2/2013      164.8780960      OUTLOOK.EXE    ClientComputer                 198.168.36.100    MSRPC  MSRPC:c/o Request: MS Exchange Directory RFR {1544F5E0-613C-11D1-93DF-00C04FD7BD09}  Call=0x1  Opnum=0x0  Context=0x0  Hint=0xC0 Warning: Octets trailer appends to authentication token      {MSRPC:105, TCP:104, IPv4:9}     65229

0.0156250           11199    8:13:23 PM 9/2/2013      164.8937210      OUTLOOK.EXE    198.168.36.100               ClientComputer       TCP        TCP:Flags=…A…., SrcPort=6950, DstPort=3117, PayloadLen=0, Seq=3823341786, Ack=264467696, Win=63764 (scale factor 0x0) = 63764  {TCP:104, IPv4:9}               63764

0.0468750           11216    8:13:23 PM 9/2/2013      164.9405960      OUTLOOK.EXE    198.168.36.100               ClientComputer       MSRPC  MSRPC:c/o Fault:  Call=0x1  Context=0x0  Status=0x5  Cancels=0x0       {MSRPC:92, TCP:88, IPv4:9}          63364

In above capture, outlook is clearly trying to use RFR interface

Windows 2008 R2 has NTLM performance counters that can be used to find out the NTLM related issues. One of the support articles on Microsoft KB

Performance counter

Explanation

Semaphore Waiters

The number of the thread that is waiting to obtain the semaphore

Semaphore Holders

The number of the thread that is holding the semaphore

Semaphore Acquires

The total number of times that the semaphore has been obtained over the lifetime of the security channel connection, or since system startup for _Total

Semaphore Timeouts

The total number of times that a thread has timed out while it waited for the semaphore over the lifetime of the security channel connection, or since system startup for _Total

Average Semaphore Hold Time

The average time (in seconds) that the semaphore is held over the last sample.

 

In the case we were troubleshooting, the value of Semaphore Timeouts was reaching beyond 100. As you can read the explanation of the Semaphore Timeouts, this counter suggests the timeouts occurred. In this process, the threads will wait and then will expire denying logon to a requestor. This causes the authentication requests to be rejected. This is exactly what was happening on the servers.

All of these symptoms are caused by a phenomena called “NTLM Bottleneck”. To fix this issue, there are a couple of ways:

Resolution 1

First kind of resolution is increase the MaxConcurrentApi value in registry. This DWORD value can be increased to 10 on Windows Server 2003 based DCs and Member servers and up to 150 on Windows Server 2008 SP2 and later DC and member servers.

  1. Start Registry Editor.
  2. Locate the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

  3. Create the following registry entry:
    Name: MaxConcurrentApi
    Type: REG_DWORD
    Value:Set the value to the larger number, which you tested (any number greater than the default value).
  4. At a command prompt, run net stop netlogon, and then run net start netlogon.

You may have to apply these settings both on the CAS servers and domain controllers depending upon the situation.

Resolution 2

Configure Exchange 2010 CAS array to use kerberos instead of NTLM using Configuring Kerberos Authentication for Load-Balanced Client Access Servers

References and Additional Reading

Is this horse dead yet: NTLM Bottlenecks and the RPC runtime

Updated: NTLM and MaxConcurrentApi Concerns

You are intermittently prompted for credentials or experience time-outs when you connect to Authenticated Services

Netlogon performance counters for Windows Server 2003

Troubleshooting SID translation failures from the obvious to the not so obvious

The Cluster Service Cannot Be Started. An Attempt To Read Configuration Data From Windows Registry Failed With Error ‘2’.

Today’s morning started with a little fire on some exchange 2010 server running as DAG members. One out of those 8 guys in the DAG was not able to continue the log replication and continued to keep the database copies in failed state.

After looking at the cluster manager it seemed that the server was not appearing in the failover cluster manager and a bunch of events in application logs:

Log Name:      Application
Source:        MSExchangeRepl
Date:          8/17/2013 11:39:09 AM
Event ID:      4092
Task Category: Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      egiex02.egi.local
Description:
Database Availability Group ‘EGI-DAG-01’ member server ‘EGIEX02’ is not completely started. Run Start-DatabaseAvailabilityGroup ‘EGI-DAG-01’ -MailboxServer ‘EGIEX02’ to start the server.

and System log showed below events when Start-DatabaseAvailabilityGroup EGI-DAG-01 –MailboxServer EGIEX02

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          8/17/2013 12:48:32 PM
Event ID:      1090
Task Category: Startup/Shutdown
Level:         Critical
Keywords:     
User:          SYSTEM
Computer:      EGIEX02.EGI.LOCAL
Description:
The Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error ‘2’. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.

This happens when a problem node is not able to communicate with the resource owner in a group. DAG uses MSCS as an underlying layer for building high availability for mailbox servers and databases using an additional logic supplied by DAG components. In an event of communication failure to another set of members in a DAG, the failover cluster will continue to attempt connections and will give up after a certain period. In my case the problem node EGIEX04 was trying to reach all 7 other members to read the configuration information but failed to do so because it could not contact either of the nodes over RPC.

Fix is fairly simple:

Open an elevated command prompt on one of the DAG members and run:

Cluster.exe Node EGIEX02 /ForceCleanUp 

After you have run above command the node will be removed from cluster.

Now open Exchange Management Shell and run:

Start-DatabaseAvailabilityGroup EGI-DAG-01 –MailboxServer EGIEX02

 

This should ideally take care of all issues related to cluster service. In case you are not able to get over the MSExchangeRepl errors after that, you may need to reseed the problem database or all of them manually.

So what causes it?

Although cluster service kept saying that it could not contact either of nodes in the cluster, all those nodes were practically contactable via remote registry, WMI, event logs, etc.

An answer lies within the XML of the event ID 4092 MSExchangeRepl.

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1090</EventID>
    <Version>0</Version>
    <Level>1</Level>
    <Task>8</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2013-08-17T07:18:32.625000000Z" />
    <EventRecordID>192930</EventRecordID>
    <Correlation />
    <Execution ProcessID="3332" ThreadID="3552" />
    <Channel>System</Channel>
    <Computer>EGIEX02.EGI.LOCAL</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="Status">2</Data>
    <Data Name="NodeName">EGIEX02</Data>
  </EventData>
</Event>

S-1-5-18  is a well known security principal Local System. and cluster service on a DAG member uses this this account as a logon account so does the replication service. Every time a node in a cluster tries to contact another it has to provide perform a security handshake and that is using Kerberos by default. When these handshakes are not successful, the caller node is denied an access to the resources and any cluster information that other nodes share among each other. Troubleshooting Kerberos is a nightmare (at least for me). This Kerberos thing can be justified very well by looking at the FailoverClustering Operational logs. You will see ample of entries of the problem node trying to perform a handshake and nothing after that.

By removing and re-adding the node to the cluster, we almost reset everything related to the problem node in the cluster database.

 

I hope that helps someone finds himself in trouble with this issue.

Manually Removing a Failed Edge Transport Role

 

Content Warning!!

Content of this post is not recommended to be used unless you do not have backup of your edge transport configuration. All steps below are tested in a specific environment and may not apply to your environment. Do it at your own risk!!

I would recommend you perform recovery of your edge transport servers Understanding Edge Transport Cloned Configuration and use Cloned Configuration method and this content must be used as a last resort.

While Exchange 2013 is out to the market and a lot of deployments are happening around, Edge Transport role of Exchange 2010 still deserves its own importance since Exchange Server 2013 does not have any Edge Transport Role of its own version.

Today, I was working on recovering an edge transport server role which did not go well with recovery. Finally, a call was made to remove this edge transport role. The only hurdle was we could not install a fresh OS on this box since the servers are located in a remote data center. The only option was to remove the edge transport role manually and clean up the OS so that it can be used to reinstall edge transport role

So here is how you do it:

Stop Exchange Services (Leave them in whatever state they are if they do not stop)

Remove Registry Entries (Note! You must perform registry backup every time you change anything in registry)

  • Open Registry Editor
  • Browse to HKLM\SOFTWARE\Microsoft
  • Locate registry key named ExchangeServer
  • Delete the key ExchangeServer
  • Browse to location HKLM\SYSTEM\CurrentControlSet\Services\
  • Locate registry keys starting with MSExchange e.g. MSExchange ADAccess
  • Remove all registry keys starting with MSExchange
  • Browse to your exchange installation location. Typically at C:\Program Files\Microsoft\
  • Delete the folder Exchange Server. If you are scared of deleting it, you can simply rename it to Exchange Server.OLD

Remove LDS Instance

  • Open Command prompt with elevated privileges
  • Browse to location C:\Windows\ADAM
  • Type ADAMUninstall /i:MSExchange and hit enter
  • Click Yes on both the dialog boxes appearing after you hit enter
  • Restart the server

While you reinstall the exchange edge transport role, you may receive some weird errors at the first time. This is expected to happen when you remove everything in a crude way.

  • Locate the registry key HKLM\SOFTWARE\Microsoft\ExchangeServer\V14\EdgeTransportRole
  • Delete the WaterMark string from the right hand side pane of registry editor
  • Browse back to location HKLM\SOFTWARE\Microsoft\ExchangeServer\V14\
  • Right click and create a new key named Transport
  • Create one more key named Pickup at HKLM\SOFTWARE\Microsoft\ExchangeServer\V14\
  • Re-run exchange edge transport setup

Your edge transport role should be back to operation and you can create a new edge subscription with Exchange 2013 mailbox or Exchange 2010 HT servers.

 

 

 

The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook

Yet another story of troubleshooting an interesting case which lead to a weird finding which is kind of a non-documented behavior of either versions of outlook with Exchange 2010 multi role installation using DAG and CAS array together. Indeed, you cannot use DAG and WNLB together but there are several organizations using hardware load balancers to configure DAG on multiple server roles on a single server yet have DAG and CAS array.

In one of these unique cases that took an abnormally long time to reach a resolution (workaround), this behavior was a major culprit. Let me come to the point.

One of the customers have a multi-server role DAG and CAS array architecture. They have two servers EX-01 and EX-02 with Mailbox, HT and CAS server roles installed on them. These servers are also the members of the only DAG they have. These servers implement a CAS array load balanced with the help of a Barracuda 340 appliance.

Diagrammatically, it looks pretty simple,

image

Everything seems to be alright. DAG *overs, CAS load balancing, mail flow, etc. works absolutely fantastic; except a haunting random pop on outlook clients that says”

“The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook”

After troubleshooting this whole case for more than a month, turns out to be a really weird finding. I don’t know whether this is something different than outlook is supposed to handle or it  can be a bug. Regardless the logic of Outlook to Exchange communication or it being a bug, it is surely interesting to know.

So here is what happens:

If you observe the above diagram carefully, only a single among two boxes in DAG + CAS array have the PF database store on them.

When Outlook clients connect to an Exchange 2010 server, they would connect directly to a mailbox server hosting PF replica. If outlook connects to a CAS array member that also has PF store hosted on it, it converges all connections and Public and Private logons as a single connection. When the client connects to a CAS array member which does not host a PF store, then Exchange issues a wrongServer response to the client and suggests a new server name for public logon. Somehow, outlook is unable to handle this response and thinks that it has to reconfigure the profile.

If you have ever been haunted by this kind of problem you can easily figure out this whole logic with the help of RCA logs on the CAS servers if you look at the RCA logs thoroughly, you will see something like below:

2013-04-04T13:53:07.585Z,17918,1,/o=Customer/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User E345f,,OUTLOOK.EXE,11.0.8200.0,Cached,,,ncacn_ip_tcp,,PublicLogon,1144 (rop::WrongServer),00:00:00,”Logon: Public,  in database 36d89041-6f58-4bcb-a7af-fd38d9994b94 last mounted on EX-02.Customer at 04-04-2013 12:30:12, currently Mounted; Redirected: not a user’s home public server, suggested new server: /o=Customer/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=EX-02″,RopHandler: Logon:

 

I am sure I can’t call it a resolution but below are couple ways to handle this:

1. Block access to public folder store on the servers. – This might be impractical for a lot organizations since PFs are still used by a lot of companies for collaboration purposes.

2. Move the PF database to another server which is not a part of a CAS array.

3. Create one more replica of PF store on another member of the CAS array. (Note: Due to some situations I could not test this scenario in labs before publishing. I would suggest having a check in the labs before doing this in production)

 

EDIT: Creating additional replica of the PF databases does not really help. You should provision a copy of PF databases on a different server.

System Attendant Fails to Start with Event ID 33, Source SideBySide

Today was the day when someone was upgrading their exchange server 2010 SP2 servers to SP3. Everything went well unless one of the servers that got new update did not want to start Microsoft Exchange System Attendant Service.

While trying to start the service it threw an error at every time it failed.

Log Name:      Application
Source:        SideBySide
Date:          21-04-2013 14:52:41
Event ID:      33
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      A1SR-EX1.company
Description:
Activation context generation failed for "C:\Program Files\Microsoft\Exchange Server\V14\bin\mad.exe". Dependent Assembly Microsoft.VC90.ATL,processorArchitecture="amd64",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8" could not be found. Please use sxstrace.exe for detailed diagnosis.

 

System Attendant startup related issues are usually due to something wrong in AD or the service failing to connect any domain controller during start up. This particular case was not related to AD although.

 

The fix was very simple.

Download and install Microsoft Visual C++ 2008 Redistributable Package (x64) and start the System Attendant.

 

.