Exchange 2010 Intermittent Password Prompts in Outlook Clients – NTLM Bottleneck

There are hundreds of articles on internet around this commonly seen issue. If you are running Exchange 2007 or later this issue occurs due to wrong certificate configuration most of the times. A wrong or missing name in certificate versus the URL defined on exchange web components like OWA, EAS, OA, OAB etc.

Exchange is a fairly complex code which runs along with or depends on several components like AD, Crypto, network components, authentication modules, etc.

This particular case I am writing about was more to do with the authentication mechanisms used by Exchange 2010. Exchange 2010 uses and supports several authentication mechanisms. Below diagram should help you understand a pretty simple looking setup that one of our customers were running:

 

image

The diagram is pretty self explanatory. It is a DAG and a CAS array with 4 domain controllers (although not all 4 are shown in diagram).

Even after verifying all certificate, url and authentication settings on OA, OWA, EAS, OAB, etc users still complained that they receive an annoying password which simply wont go away even after entering the correct user name and password.

Finally, we decided to look further into what is happening when the authentication requests is submitted to the CAS array and interestingly, we could correlate some event IDs in security log of  CAS servers which pointed towards the authentication issue. After investigating security logs carefully on the CAS server we found some entries relevant to a computer which reported a problem. The security log for this computer read as below:

Log Name: Security
Source: Microsoft-Windows-Security-Auditing
Date: 9/5/2013 10:22:59 PM
Event ID: 4625
Task Category: Logon
Level: Information
Keywords: Audit Failure
User: N/A
Computer: cas02.exchange.local
Description:
An account failed to log on.
Subject:
  Security ID: NULL SID
  Account Name: –
  Account Domain: –
  Logon ID: 0x0
Logon Type: 3
Account For Which Logon Failed:
  Security ID: NULL SID
  Account Name: username
  Account Domain: EXCHANGE
Failure Information:
  Failure Reason: An Error occurred during Logon.
  Status: 0xc000005e
  Sub Status: 0x0
Process Information:
  Caller Process ID: 0x0
  Caller Process Name: –
Network Information:
  Workstation Name:
  Source Network Address: 178.239.86.252
  Source Port: 37109
Detailed Authentication Information:
  Logon Process: NtLmSsp
  Authentication Package: NTLM
  Transited Services: –
  Package Name (NTLM only): –
  Key Length: 0

Initially it looked like an issue described in http://support.microsoft.com/kb/2157973/en-us but that was not the case since the error code described in KB and error above do not match. Also, there was no smart card logon used. To find out what the error code 0xc000005e meant, we used err.exe and the output was

C:\Tools\Err>Err.exe 0xc000005e
# for hex 0xc000005e / decimal -1073741730 :
  STATUS_NO_LOGON_SERVERS
# There are currently no logon servers available to service

Suspecting something wrong with NTLM netlogon.log was a potential subject to be looked at. Netlogon.log on client shows

Time [LOGON] SamLogon: Network logon of EXCHANGE\UserName from WorkstationName Returns 0xC000005E

It was again little misleading since the AD servers were up and running and processing the logon requests. There was no DNS issues identified either. A lot of googling and Binging, we reached out to a conclusion that lead us to think that something was wrong with the NTLM stuff. So what was it?

You may notice that NTLM bottlenecks can be caused due to RPC/HTTPS requests. RPC/HTTPS are definitely a key contributor to large NTLM requests since the session established using RPC/HTTPS has to be authenticated twice due to two different protocol payloads. Outer layer of HTTP requires the authentication once and the tunneled RPC requires another authentication to take place generating twice the load. Moreover, HTTP is a stateless protocol which can cause multiple authentication requests to be handled by the server.

Although RPC/HTTPS generates additional NTLM authentication requests; a direct MAPI connection to CAS / CAS array can also contribute to this if the traffic is too high. MAPI supports Kerberos authentication and the default setting in Outlook 2007 and later is to negotiate the strongest authentication available when not running in Outlook Anywhere mode. Unless kerberos support is configured in the environment, outlook will fall back on NTLM by default.

Considering all the factors and research done the only conclusion derived was to look for NTLM authentication related issues. A quick network packet capture on CAS servers help determining whether it is NTLM or something else.

To capture the precise results, leave the network capture running on the CAS server until a case of password prompt is reported. You will notice that the capture reveals something like below between the CAS server and client. (Running a simultaneous capture on client and servers both can help gathering precise results

0.0000000           11198    8:13:23 PM 9/2/2013      164.8780960      OUTLOOK.EXE    ClientComputer                 198.168.36.100    MSRPC  MSRPC:c/o Request: MS Exchange Directory RFR {1544F5E0-613C-11D1-93DF-00C04FD7BD09}  Call=0x1  Opnum=0x0  Context=0x0  Hint=0xC0 Warning: Octets trailer appends to authentication token      {MSRPC:105, TCP:104, IPv4:9}     65229

0.0156250           11199    8:13:23 PM 9/2/2013      164.8937210      OUTLOOK.EXE    198.168.36.100               ClientComputer       TCP        TCP:Flags=…A…., SrcPort=6950, DstPort=3117, PayloadLen=0, Seq=3823341786, Ack=264467696, Win=63764 (scale factor 0x0) = 63764  {TCP:104, IPv4:9}               63764

0.0468750           11216    8:13:23 PM 9/2/2013      164.9405960      OUTLOOK.EXE    198.168.36.100               ClientComputer       MSRPC  MSRPC:c/o Fault:  Call=0x1  Context=0x0  Status=0x5  Cancels=0x0       {MSRPC:92, TCP:88, IPv4:9}          63364

In above capture, outlook is clearly trying to use RFR interface

Windows 2008 R2 has NTLM performance counters that can be used to find out the NTLM related issues. One of the support articles on Microsoft KB

Performance counter

Explanation

Semaphore Waiters

The number of the thread that is waiting to obtain the semaphore

Semaphore Holders

The number of the thread that is holding the semaphore

Semaphore Acquires

The total number of times that the semaphore has been obtained over the lifetime of the security channel connection, or since system startup for _Total

Semaphore Timeouts

The total number of times that a thread has timed out while it waited for the semaphore over the lifetime of the security channel connection, or since system startup for _Total

Average Semaphore Hold Time

The average time (in seconds) that the semaphore is held over the last sample.

 

In the case we were troubleshooting, the value of Semaphore Timeouts was reaching beyond 100. As you can read the explanation of the Semaphore Timeouts, this counter suggests the timeouts occurred. In this process, the threads will wait and then will expire denying logon to a requestor. This causes the authentication requests to be rejected. This is exactly what was happening on the servers.

All of these symptoms are caused by a phenomena called “NTLM Bottleneck”. To fix this issue, there are a couple of ways:

Resolution 1

First kind of resolution is increase the MaxConcurrentApi value in registry. This DWORD value can be increased to 10 on Windows Server 2003 based DCs and Member servers and up to 150 on Windows Server 2008 SP2 and later DC and member servers.

  1. Start Registry Editor.
  2. Locate the following registry subkey:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

  3. Create the following registry entry:
    Name: MaxConcurrentApi
    Type: REG_DWORD
    Value:Set the value to the larger number, which you tested (any number greater than the default value).
  4. At a command prompt, run net stop netlogon, and then run net start netlogon.

You may have to apply these settings both on the CAS servers and domain controllers depending upon the situation.

Resolution 2

Configure Exchange 2010 CAS array to use kerberos instead of NTLM using Configuring Kerberos Authentication for Load-Balanced Client Access Servers

References and Additional Reading

Is this horse dead yet: NTLM Bottlenecks and the RPC runtime

Updated: NTLM and MaxConcurrentApi Concerns

You are intermittently prompted for credentials or experience time-outs when you connect to Authenticated Services

Netlogon performance counters for Windows Server 2003

Troubleshooting SID translation failures from the obvious to the not so obvious

4 thoughts on “Exchange 2010 Intermittent Password Prompts in Outlook Clients – NTLM Bottleneck”

  1. Nice write up, however I am surprised that for a large environment the array was not configured for Kerberos. Another challenge you need to keep you eye open would be token sizes….future in this case

      1. So damn true….and some organizations do not like it due to their stupid identity/account policies that need password changes and expiries …. However a nice article 🙂

Comments are closed.