Monthly Archives: May 2011

Exchange 2010: Restore-DatabaseAvailabilityGroup fails to evict nodes error 0x46.

Restore-DatabaseAvailabilityGroup is one of the cmdlets used as part of the datacenter switchover process. The purpose of Restore-DatabaseAvailabilityGroup is to read the DAG’s list of stopped servers and evict the listed servers from the DAG’s underlying cluster. The list of servers in this scenario typically includes all DAG members in the failed primary datacenter. This allows the DAG and the cluster to shrink, and because it now has fewer members, it requires fewer servers to maintain quorum and perform DAG operations.

Restore-DatabaseAvailabilityGroup:

1) Starts a surviving node in the second datacenter using /forceQuourm.

2) Forcibly evicts each server listed on the stopped servers list.

I have worked support cases where this eviction process fails with an exception. In these cases, restore-databaseAvailabilityGroup issued the eviction while the Cluster service was still initializing (even though service control manager reported the service as started). When the Cluster service is initializing it is unable to process eviction requests. As a result, the commands failed. For a few customers, the error is consistently reproducible necessitating the use of a workaround in order for restore-databaseAvailabiltyGroup to work.

Note: Customers upgrade to Exchange 2010 Service Pack 1 before following these instructions. These instructions will only work with Exchange 2010 SP1.

Prior to SP1, the Cluster service must be found in a stopped state in order to utilize restore-databaseAvailabilityGroup. After SP1, the Cluster service no longer needs to be in a stopped state in order to proceed.

The following error may be noted when running

restore-databaseAvailabilityGroup –site <DRSite>

WARNING: Server ‘PrimarySiteServer’ was marked as stopped in database availability
group ‘DAG’ but couldn’t be removed from the cluster. Error: A server-side
database availability group administrative operation failed. Error: The
operation failed. CreateCluster errors may result from incorrectly configured
static addresses. Error: An error occurred while attempting a cluster
operation. Error: Cluster API
‘"EvictClusterNodeEx(node.domain.com) failed with 0x46.
Error: The remote server has been paused or is in the process of being
started"’ failed. [Server: DRSiteServer.domain.com]
WARNING: The operation wasn’t successful because an error was encountered. You
may find more details in log file
"C:ExchangeSetupLogsDagTasksdagtask_2010-09-02_14-54-39.766_restore-databaseavailabilitygroup.log".

The error 0x46 translates to

ERROR_SHARING_PAUSED winerror.h
# The remote server has been paused or is in the process of
# being started.

Upon further review, the Service Control Manager reported the Cluster service as started, and Failover Cluster Manager will connect to the cluster service. Despite the error message, the attempt to start the Cluster service by using /forceQuorum was successful.

So the solution is simply to re-run restore-databaseAvailabilityGroup and the stopped DAG members will be successfully evicted.

Exchange 2007–Using LCR to perform an online offline database seed.

4 Replies

When using continuous replication in Exchange 2007, an operation that sometimes needs to be performed is a database seed. This operation is first performed as part of enabling replication, and thereafter it is performed infrequently as part of the process for recovering from divergence.

There are a few ways to perform a database seed, but seeding is most often performed by using the Update-StorageGroupCopy cmdlet. With this cmdlet, an ESE streaming backup is performed on the source database and the backup copy is then copied to the target.

Another way to seed a database copy is to perform a manual offline seeding. In this operation, the source database is dismounted, verified to be in a clean shutdown state, and then manually copied offline to the target. This can obviously be inconvenient, since the source database has to be down while the copy procedure is being performed.

A third method is to use a VSS backup of the database to seed the database copy, which I discuss in my previous post, Exchange 2007 – Using VSS to perform an online offline database seed.

Finally, yet another method is to utilize LCR as an SCR seeding source. In this blog post, I’ll show you how to do that.

====================================

The first step is to enable LCR for the source database by using the Enable-DatabaseCopy and Enable-StorageGroupCopy cmdlets.

(LCR)
Enable-DatabaseCopy –Identity <ServerNameDatabaseName> –CopyEdbFilePath “pathdatabase.edb”

If you have already enabled continuous replication for the storage group, proceed to the second step.

====================================

The second step is to enable standby continuous replication on the storage groups by using the Enable-StorageGroupCopy cmdlet.

(SCR)
Enable-StorageGroupCopy –Identity <ServerNameStorageGroupName> –StandbyMachine <SCRTargetName> –SeedingPostponed

For more information on enabling SCR, please see my blog post at http://blogs.technet.com/timmcmic/archive/2009/01/22/inconsistent-results-when-enabling-standby-continuous-replication-scr-in-exchange-2007-sp1.aspx

If you have already enabled continuous replication for the storage group, proceed to the third step.

====================================

The third step is to suspend the storage group copy. Storage group copies can be suspended either in bulk or one at a time. The following are example commands:

(All Storage Groups)
Get-StorageGroup –Server <SourceServerName> | Suspend-StorageGroupCopy –StandbyMachine <TargetMachineName>

(Single Storage Group)
Suspend-StorageGroupCopy –Identity <ServerNameStorageGroupName> –StandbyMachine <TargetMachineName>

It is important that in the SCR environment these commands are run on both the source and target servers. All servers should indicate a suspended status, reflecting that both Active Directory replication and the Microsoft Exchange Replication service configuration updates occurred successfully.

====================================

The fourth step is to note the important paths that are necessary to complete the rest of these steps. Specifically, we are interested in the storage group log file path, the system folder path and copy system folder path, and the log file prefix. For the mailbox database we are interested in the database file path and copy database file paths.

To get all paths for all storage groups on the source, use the following command:

Get-StorageGroup –Server <ServerName> | fl Name,LogFolderPath,SystemFolderPath,CopyLogFolderPath,CopySystemFolderPath,LogFilePrefix

This will give you a formatted list of storage group names, log paths, and system paths.

To get the paths for all mailbox databases, use the following command:

Get-MailboxDatabase –Server <ServerName> | fl Name,EdbFilePath,CopyEdbFilePath

This will give you a formatted list of mailbox database names and mailbox database paths.

Here is an example of the output you can expect to see (copy path attributes will only be populated if you are utilizing LCR):

Name : Mailbox Database LCR
EdbFilePath : d:SG1DB1.edb
CopyEdbFilePath : d:SG1-LCRDB1.edb

Name : Mailbox Database CCR or SCR
EdbFilePath : d:SG2DB2.edb
CopyEdbFilePath :

Name                 : Storage Group LCR
LogFolderPath        : d:SG1
SystemFolderPath     : d:SG1
CopyLogFolderPath    : d:SG1-LCR
CopySystemFolderPath : d:SG1-LCR
LogFilePrefix        : E00

Name                 : Storage Group CCR or SCR
LogFolderPath        : d:SG2
SystemFolderPath     : d:SG2
CopyLogFolderPath    :
CopySystemFolderPath :
LogFilePrefix        : E01

====================================

The fifth step is to verify that the source log file sequence is in order. If the source log file sequence has been manually manipulated, and if any log file gaps are present, this results in a failure of the seed operation. This step ensures that log files are in sequence on the source machine.

To ensure that the log sequence on the source machine is in the correct order, perform the following operations:

1. Open a command prompt and navigate to the log directory of the storage group. This path can be found from the output gathered in step 3 above.

2. Run the following eseutil command:

eseutil /ml <LogFilePrefix>

The log file prefix can be found from the output gathered in step 3.

When you run this command it will scan every log file found in the source directory. If any gaps or errors are identified, you cannot continue with these steps. If the command completes and errors on the last log file in the series this is expected, as the Exx.log is currently open for writing and cannot be scanned. The following is sample output that you should receive for a storage group that is online.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode…

Verifying log files…
Base name: e00

      Log file: d:SG1E0000001353.log – OK
      Log file: d:SG1E0000001354.log – OK
      Log file: d:SG1E0000001355.log – OK
      Log file: d:SG1E0000001356.log – OK
      Log file: d:SG1E0000001357.log – OK
      Log file: d:SG1E0000001358.log – OK
      Log file: d:SG1E0000001359.log – OK
      Log file: d:SG1E000000135A.log – OK
      Log file: d:SG1E000000135B.log – OK
      Log file: d:SG1E000000135C.log – OK
      Log file: d:SG1E000000135D.log – OK
      Log file: d:SG1E000000135E.log – OK
      Log file: d:SG1E000000135F.log – OK
      Log file: d:SG1E0000001360.log – OK
      Log file: d:SG1E0000001361.log – OK
      Log file: d:SG1E0000001362.log – OK
      Log file: d:SG1E0000001363.log – OK
      Log file: d:SG1E0000001364.log – OK
      Log file: d:SG1E0000001365.log – OK
      Log file: d:SG1E0000001366.log – OK
      Log file: d:SG1E0000001367.log – OK
      Log file: d:SG1E0000001368.log – OK
      Log file: d:SG1E0000001369.log – OK
      Log file: d:SG1E00.log
                ERROR: Cannot open log file (d:SG1E00.log). Error -1032.

Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 368.625 seconds.

====================================

The sixth step is to prepare the LCR copies for use in the SCR seeding process. This starts by verifying the health of the LCR copies.

To verify the health of the LCR copies, on the server hosting the LCR databases run get-storagegroupcopystatus. If any database shows a status of other than healthy this will need to be corrected before continuing with these instructions.

Get-StorageGroupCopyStatus

Name                      SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte
                          atus          th            ength        dLogTime
—-                      ————- ————- ———— ————
MBX-1-SG1                 Healthy       0             0            3/6/2011 …

====================================

The seventh step is to ensure that the target paths are ready to have the database moved in place. The paths referenced in these steps can be obtained from the output gathered in step 3.

For SCR – ensure that the logFolderPath, systemFolderPath, and edbFilePath are empty on the SCR target.

At this point the destination paths are empty and ready for the database to be moved.

We now need to create the directory structure where logs, system, and database files will be copied.

For SCR – create the log, system, and database folder. In our example logs, system, and database files are located at d:SG1. Therefore on the SCR target or CCR passive node I would create the directory structure d:SG1.

If you are using nested folders you need to create the entire directory structure.

====================================

The eighth step is to move the restored database to the target directory. This can be accomplished a few different ways, but I will make a recommendation below.

To being the LCR database copies need to be suspended. This can be performed in bulk

get-storagegroup –server <LCRHost> | suspend-storagegroupcopy

The success of this command can be verified using get-storagegroupcopystatus.

Get-StorageGroupCopyStatus

Name                      SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte
                          atus          th            ength        dLogTime
—-                      ————- ————- ———— ————
MBX-1-SG1                 Suspended     0             0            3/6/2011 …

The LCR database file can be located at the CopyEdbFile path noted in step four. Using a command prompt navigate to this location.

The SCR target location can be mapped as a network drive. We will assume for this example that the network drive Y is utilized.

Use eseutil to copy the database from the source directory to the target directory. The command using our example is:

eseutil /y SG1-DB1.edb /d y:SG1-DB1.edb

Here is the expected output from this command:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating COPY FILE mode…
Source File: SG1-DB1.edb

Destination File: y:SG1-DB1.edb

Copy Progress (% complete)

0 10 20 30 40 50 60 70 80 90 100

|—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

……………………………………………

Operation completed successfully in 13.281 seconds.

At this point the copy has been seeded on the target server.

When the copy is completed the LCR replication can be resumed using

get-storagegroup –server <LCRHost> | resume-storagegroupcopy

Information on the usage of Eseutil can be found here. http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx

====================================

The ninth step is to verify the health of the copied database. We need to ensure that the database was not damaged as a part of the copy process.

Log on locally to the SCR target, open a command prompt, and navigate to the database directory. In our example this would be d:SG1.

Use Eseutil /k to perform a checksum of the database:

eseutil /k SG1-DB1.edb

The following output will be observed when the command completes:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating CHECKSUM mode…
Database: SG1-DB1.edb
Temp. Database: TEMPCHKSUM3888.EDB

File: SG1-DB1.edb

Checksum Status (% complete)

0 10 20 30 40 50 60 70 80 90 100

|—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

……………………………………………

514 pages seen
0 bad checksums
0 correctable checksums
129 uninitialized pages
0 wrong page numbers
0x4676 highest dbtime (pgno 0x86)
65 reads performed
4 MB read
1 seconds taken
4 MB/second
2755 milliseconds used
42 milliseconds per read
78 milliseconds for the slowest read
15 milliseconds for the fastest read

Operation completed successfully in 0.140 seconds.

We are interested in ensuring that there are 0 bad checksums (bolded line above).

====================================

The last step in the process is to resume the storage group copy:

Get-StorageGroup –Server <SourceServerName> | Resume-StorageGroupCopy –StandbyMachne <SCRTargetName>

(Note: This command resumes storage group copy for all storage groups. If you have a storage group that is suspended for another reason it may be necessary to resume storage groups individually).

When replication has resumed successfully, you can note the following events in the Application event log indicating that replication began copying log files.

Event Type:    Information
Event Source: MSExchangeRepl
Event Category:   Action
Event ID:    2084
Date:        3/16/2010
Time:        10:12:50 AM
User:        N/A
Computer:    SERVER
Description: Replication for storage group SERVERStorage Group SCR or CCR has been resumed.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:   Service
Event ID:    2114
Date:        3/16/2010
Time:        10:13:19 AM
User:        N/A
Computer:    SERVER
Description: The replication instance for storage group SERVERStorage Group SCR or CCR has started copying transaction log files. The first log file successfully copied was generation 31201.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

====================================

The following are links to references from this post.

· Enable-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)

· Enable-DatabaseCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)

· Suspend-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa998182(EXCHG.80).aspx)

· Get-StorageGroup (http://technet.microsoft.com/en-us/library/aa998331(EXCHG.80).aspx)

· Get-MailboxDatabase (http://technet.microsoft.com/en-us/library/bb124924(EXCHG.80).aspx)

· ESEUTIL (http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx)

· Resume-StorageGroupCopy (http://technet.microsoft.com/en-us/library/bb124529(EXCHG.80).aspx)

====================================

Updates

6/26/11 – removed LCR enable step in step 1 that included seedingPostponed. This was not necessary.

====================================

Exchange 2010: Third party replication API and the DataMoveReplicationConstraint

Exchange 2010: Error 0x721 – A security package specific error occurred

2 Replies

Recently I was presented with an interesting case regarding the inability to mount databases. The history preceding the event was fairly unremarkable and only noted after running patch maintenance on the server and rebooting. Post reboot every time the customer attempted to mount a public folder database, the following active manager error occurred:

Couldn’t mount the database that you specified. Specified database: Public Folder Store NAME; Error code: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx [Database: Public Folder Store NAME, Serve r: server.child.domain.local].
+ CategoryInfo : InvalidOperation: (Public Folder Store NAME:ADObjectId) [Mount-Database], InvalidOperationException
+ FullyQualifiedErrorId : F34E87D0,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase

Also when reviewing the application log the following event was noted:

Log Name: Application

Source: MSExchange Configuration Cmdlet – Remote Management

Date: 11/7/2010 9:46:17 AM

Event ID: 4

Task Category: General

Level: Error

Keywords: Classic

User: N/A

Computer: SERVER.child.domain.local

Description:

(PID 6364, Thread 43) Task Mount-Database writing error when processing record of index 0. Error: System.InvalidOperationException: Couldn’t mount the database that you specified. Specified database: Public Folder Store NAME; Error code: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx [Database: Public Folder Store NAME, Server: SERVER.child.domain.local]. —> Microsoft.Exchange.Cluster.Replay.AmDbActionWrapperException: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx —> Microsoft.Exchange.Data.Storage.AmOperationFailedException: An Active Manager operation failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx —> Microsoft.Exchange.Rpc.RpcException: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx

at ThrowRpcException(Int32 rpcStatus, String message)

at Microsoft.Exchange.Rpc.RpcClientBase.ThrowRpcException(Int32 rpcStatus, String routineName)

at Microsoft.Exchange.Rpc.ActiveManager.AmRpcClient.MountDatabaseDirectEx(Guid guid, AmMountArg arg)

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.<>c__DisplayClass26.<MountDatabaseDirectEx>b__25(String )

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.<>c__DisplayClass4e.<RunRpcOperationWithAuth>b__4c()

at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientRetryableOperation(String serverName, RpcClientOperation rpcOperation)

— End of inner exception stack trace —

at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientHandleRpcException(RpcException ex, String serverName)

at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientRetryableOperation(String serverName, RpcClientOperation rpcOperation)

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.RunRpcOperationWithAuth(AmRpcOperationHint rpcOperationHint, String serverName, String databaseName, NetworkCredential networkCredential, Nullable`1 timeoutMs, AmRpcClient& rpcClient, InternalRpcOperation rpcOperation)

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.MountDatabaseDirectEx(String serverToRpc, Guid dbGuid, AmMountArg mountArg)

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbAction.MountDatabaseDirect(AmServerName serverName, AmServerName lastMountedServerName, Guid dbGuid, MountFlags flags, AmDbActionCode actionCode)

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbPamAction.RunMountDatabaseDirect(AmServerName serverToMount, MountFlags mountFlags, Boolean fLossyMountEnabled)

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbPamAction.<>c__DisplayClass3.<AttemptMountOnServer>b__1(Object , EventArgs )

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmHelper.HandleKnownExceptions(EventHandler ev)

— End of inner exception stack trace (Microsoft.Exchange.Data.Storage.AmOperationFailedException) —

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbOperation.Wait(TimeSpan timeout)

at Microsoft.Exchange.Cluster.ActiveManagerServer.ActiveManagerCore.MountDatabase(Guid mdbGuid, MountFlags flags, DatabaseMountDialOverride mountDialOverride, AmDbActionCode actionCode)

at Microsoft.Exchange.Cluster.ActiveManagerServer.AmRpcServer.<>c__DisplayClass4.<MountDatabase>b__3()

at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.RunRpcServerOperation(String databaseName, RpcServerOperation rpcOperation)

— End of stack trace on server (SERVER.child.domain.local) —

at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientRethrowIfFailed(String databaseName, String serverName, RpcErrorExceptionInfo errorInfo)

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.RunDatabaseRpcWithReferral(AmRpcOperationHint rpcOperationHint, Database database, String targetServer, AmRpcClient& rpcClient, InternalRpcOperation rpcOperation)

at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.MountDatabase(Database database, Int32 flags, Int32 mountDialOverride)

at Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase.InternalProcessRecord()

— End of inner exception stack trace —

The error message and event unto themselves are not very telling as to what the issue was. The important part of the event, which is not unique to Exchange and has been seen with other shell commands, is the security package error:

# for hex 0x721 / decimal 1825
RPC_S_SEC_PKG_ERROR winerror.h
# A security package specific error occurred.

After some investigation we were able to determine that the active directory forest where Exchange was installed contained a multiple domain structure. In this case we searched the entire directory, and found that there were two ENABLED machine accounts with the same name residing in two different domain naming contexts in the same forest. After identifying the machine account that was not being used (in this case the one in a child domain where Exchange servers were not installed) and deleting it – our mount commands proceeded successfully with no issues noted.

TIMMCMIC

Navigating the world of high availability….and occassionally sticking my head in the cloud…

Monthly Archives: May 2011

Recommended reading – Exchange 2010 Data Guarantee API

Exchange 2010: Restore-DatabaseAvailabilityGroup fails to evict nodes error 0x46.

Exchange 2007–Using LCR to perform an online offline database seed.

Exchange 2010: Third party replication API and the DataMoveReplicationConstraint

Exchange 2010: Error 0x721 – A security package specific error occurred