The Cluster Service Cannot Be Started. An Attempt To Read Configuration Data From Windows Registry Failed With Error ‘2’.

Today’s morning started with a little fire on some exchange 2010 server running as DAG members. One out of those 8 guys in the DAG was not able to continue the log replication and continued to keep the database copies in failed state.

After looking at the cluster manager it seemed that the server was not appearing in the failover cluster manager and a bunch of events in application logs:

Log Name:      Application
Source:        MSExchangeRepl
Date:          8/17/2013 11:39:09 AM
Event ID:      4092
Task Category: Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      egiex02.egi.local
Description:
Database Availability Group ‘EGI-DAG-01’ member server ‘EGIEX02’ is not completely started. Run Start-DatabaseAvailabilityGroup ‘EGI-DAG-01’ -MailboxServer ‘EGIEX02’ to start the server.

and System log showed below events when Start-DatabaseAvailabilityGroup EGI-DAG-01 –MailboxServer EGIEX02

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          8/17/2013 12:48:32 PM
Event ID:      1090
Task Category: Startup/Shutdown
Level:         Critical
Keywords:     
User:          SYSTEM
Computer:      EGIEX02.EGI.LOCAL
Description:
The Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error ‘2’. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.

This happens when a problem node is not able to communicate with the resource owner in a group. DAG uses MSCS as an underlying layer for building high availability for mailbox servers and databases using an additional logic supplied by DAG components. In an event of communication failure to another set of members in a DAG, the failover cluster will continue to attempt connections and will give up after a certain period. In my case the problem node EGIEX04 was trying to reach all 7 other members to read the configuration information but failed to do so because it could not contact either of the nodes over RPC.

Fix is fairly simple:

Open an elevated command prompt on one of the DAG members and run:

Cluster.exe Node EGIEX02 /ForceCleanUp 

After you have run above command the node will be removed from cluster.

Now open Exchange Management Shell and run:

Start-DatabaseAvailabilityGroup EGI-DAG-01 –MailboxServer EGIEX02

 

This should ideally take care of all issues related to cluster service. In case you are not able to get over the MSExchangeRepl errors after that, you may need to reseed the problem database or all of them manually.

So what causes it?

Although cluster service kept saying that it could not contact either of nodes in the cluster, all those nodes were practically contactable via remote registry, WMI, event logs, etc.

An answer lies within the XML of the event ID 4092 MSExchangeRepl.

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1090</EventID>
    <Version>0</Version>
    <Level>1</Level>
    <Task>8</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2013-08-17T07:18:32.625000000Z" />
    <EventRecordID>192930</EventRecordID>
    <Correlation />
    <Execution ProcessID="3332" ThreadID="3552" />
    <Channel>System</Channel>
    <Computer>EGIEX02.EGI.LOCAL</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="Status">2</Data>
    <Data Name="NodeName">EGIEX02</Data>
  </EventData>
</Event>

S-1-5-18  is a well known security principal Local System. and cluster service on a DAG member uses this this account as a logon account so does the replication service. Every time a node in a cluster tries to contact another it has to provide perform a security handshake and that is using Kerberos by default. When these handshakes are not successful, the caller node is denied an access to the resources and any cluster information that other nodes share among each other. Troubleshooting Kerberos is a nightmare (at least for me). This Kerberos thing can be justified very well by looking at the FailoverClustering Operational logs. You will see ample of entries of the problem node trying to perform a handshake and nothing after that.

By removing and re-adding the node to the cluster, we almost reset everything related to the problem node in the cluster database.

 

I hope that helps someone finds himself in trouble with this issue.

Managing Exchange 2013 Anti Malware Scanning – Part 2

In last article Managing Exchange 2013 Anti Malware Scanning – Part 1 we looked at a tradition way to configure Anti Malware Protection in Exchange 2013. In Part 2, I am going to write little more about how to configure a policy using powershell. At the end of this part, you will be able to notice few differences in the way EAC can be used to configure Anti Malware Protection and the way powershell can be used to do some additional tasks.

Powerful Way

I have no clue why Microsoft named that black and white CLI; powershell but it truly is. It is really powerful in many ways. It proves its power here with configuration stuff as well. There are several tasks in Exchange which cannot be performed using GUI interfaces. Powershell is an only option unless someone writes some custom code to get a whole new interface for doing those tasks. With said all this, anti malware protection topic is no exception to it. EAC does not provide any option to configure malware policies but powershell does.

Let us take a look at how do we use powershell to configure a new malware policy. To create a new malware policy we use a cmdlet New-MalwareFilterPolicy. This cmdlet accepts 22 usable named parameters and 2 for internal use by MS.

I am going to use translate the policy that I configured in Part 1 to powershell in this example. A powershell translation of the configured policy will look like below:

New-MalwareFilterPolicy -Name “Custom Policy” -AdminDisplayName “Custom Policy Configured for AMP testing” –CustomNotifications: $true FromName “Anti Malware Protection” -Action DeleteAttachmentAndUseCustomAlertText  -CustomFromAddress “amp@egi.local” -CustomInternalSubject “A message from internal sender was deleted but generated NDR” -CustomInternalBody “A message from internal sender was found harmful and deleted. A notification was sent but generated a NDR. Please check logs”

This newly created policy then can be edited using EAC.

image

 

What’s Next?

Okay smarty, you showed me your knowledge of powershell and now I know how to create the policy. What’s next? It does not stop here for sure.

Management

Anti malware protection uses Microsoft AV scanning engine to provide protection. Like other filtering solutions this one also needs at least some human intervention configure it.

Updates

It uses an antivirus scanning engine to ensure your messages are clean. That means, it would require updates to maintain the latest definitions. You can configure the update settings using Get-EngineUpdateCommonSettings.

image

In an event of troubleshooting some scenarios, you may also need to find out the definition updates installed on the servers. You can simply use Get-EngineUpdateInformation to find out the current updates status.

image

 

Similar to Get-EngineUpdateCommonSettings with some additional settings to it. For example:

BypassFiltering – configures the engine to not scan the messages. If you configure this setting, you will notice that the malware agent is still on. It does not disable the agent but stops scanning emails until you set the settings back to False. I would use this to troubleshoot a related problem.

ForceRescan – tells this piece of code whether to rescan a message even if it is already scanned by Exchange Online Protection. Well, a good idea to do so. Probably you get an additional layer of security with that. But it is always tricky to suggest a very generic solution on whether to this kind of double check or not.

DeferWaitTime, DeferAttempts, etc, tell the engine about how to handle a message if it is not scan-able. I do not want to duplicate the knowledge that already exists on Technet, a best read about all these settings is already available at Set-MalwareFilteringServer

image

When we add a powershell snap in named Microsoft.Forefront.Filtering.Management.PowerShell we get two more cmdlets to work with.

Get-AntivirusScanPreferenceGroup and Get-AntivirusScanSettings (actually 4 cmdlnets. Set- for both of them). There is not much that you configure using Set-AntivirusScanPreferenceGroup or Set-AntivirusScanSettings except enabling or disabling the scanning.

Although nothing looks very much configurable, I guess at some time you will have an ability to use additional or third party scanning engines to make the anti malware protection much better. The reason I think that is because of the way Microsoft AV worked in initial days. I remember someone telling me about 8 different scanning engines that this AV used.

image

 

image

 

Finally,

Anti malware protection is an in the box feature that can help you reducing attack surface significantly. Indeed, there is no 100% safe solution for viruses or spammers but you can always use an available solution with an another equivalently intelligent solution to achieve maximum security.