Dual-write glitch allows Exchange Online cmdlets to be executed without an audit trail

In this article we will explore an easy to use method which allowed anyone with sufficient permissions to execute Exchange Online cmdlets without leaving an audit trail. Before we delve into the details, I want to make it clear that the issue is now fixed, hence the past tense. I’ve taken the responsible route of reporting it to Microsoft first, and waiting for them to address it, before sharing the details. So the below is mostly for illustrative purposes now, but nevertheless it does shed light on some larger underlying issues.

I also want to express my continued disappointment in Microsoft’s behavior in certain areas. The issue outlined here is almost certainly a side effect of the systematic neglect of “core” functionalities, in favor of flashy marketing-driven releases. Sure, growth is important, but when the fundaments start cracking, it’s time to take a deep breath and reevaluate your priorities. /rant off

A short preamble

Exchange Online was one of the first workloads to make the cloud journey, and we often refer to it as the most mature one. In reality, while ExO continues to be the leader in some areas (such as the introduction of PAM or RBAC for applications), it does not receive nearly enough attention in others (Graph API endpoints for example). Thus, even in areas where Exchange was traditionally in very strong position, gaps have appeared, or we see examples of other workloads doing a better job. Combine this with the ever-changing nature of the cloud and constant stream of new scenarios to address, and you got yourself a problem.

Whatever the reasons behind it, the fact remains that in recent years we’ve seen multiple examples of features that “patch” an existing functionality in hopes of addressing a shortcoming or introduce support for a new scenario. There is some merit to such approach. Given the vast user base, Microsoft is understandingly trying to preserve compatibility with older methods and features, often times by means of changes that outright border the definition of a “hack” or a “workaround”. For example, the initial implementation of support for multi-factor authentication for PowerShell (and modern auth in general) used the same old endpoint as basic auth does, and it required that WinRM basic auth remains enabled client-side.

The dual-write model used by Exchange Online and Azure AD is another example. While in theory a new feature that gets rid of the “back-sync” dependence sounds like a great idea, in reality its implementation came with few caveats. Apart from not supporting all Exchange Online objects and properties, one of the biggest issues has been the useless audit records it produced, which attribute all the changes to a single service principal, namely Microsoft Substrate Management. We’ve complained about this in many a conversation with Microsoft folks, as well as multiple blog posts (most recently here), to no effect. Well, as demonstrated by the issues detailed in the remainder of this article, we were right to do so.

First things first, authentication

In a nutshell, the issue revolves around how Exchange handles incorrect values for the mailbox anchor, and is supplemented by poor practices within the dual-write method. When providing an incorrect value for the mailbox anchor, any “write” action against ExODS will fail. Actions that are subject to the dual-write model however will succeed on Azure AD’s side, and in turn will be written back to ExODS. This behavior certainly goes against the advertised mechanics of the dual-write model. Quote: “changes made in EXO will immediately reflect in AAD when the cmdlet completes successfully.” Not only the change will succeed, it will be reflected both in AAD and ExODS regardless of the cmdlet execution failure. To add insult to injury, no proper audit trail will be generated, as the only events you will find correspond to the Microsoft Substrate Management principal and cannot be attributed to any given user. Let’s illustrate all this with an example.

First, we need to handle authentication. Any of the methods available for obtaining an access token in the delegate permissions model will do. If you don’t feel comfortable playing with the ADAL/MSAL routines or direct HTTPS queries, an easy way is to leverage the Exchange Online PowerShell module itself. As we’ve covered in previous articles, when using the good old RPS connectivity method, even modern authentication credentials are passed over the same endpoint, and thus can be examined directly via the Get-PSSession cmdlet. Here’s an example using the V3 version of the module:

Connect-ExchangeOnline -UseRPSSession -UserPrincipalName vasil@michevdev3.onmicrosoft.com 

(Get-PSSession).Runspace.ConnectionInfo.Credential.GetNetworkCredential().Password

If you are using the “native” method of establishing a session in the V3 module, you can instead use the following to obtain the token:

#Get any existing contexts
$context = [Microsoft.Exchange.Management.ExoPowershellSnapin.ConnectionContextFactory]::GetAllConnectionContexts()

#Get an existing token from the cache
$context[0].TokenProvider.GetValidTokenFromCache("Get-Mailbox").AuthorizationHeader

#Or generate a new one
$context[0].TokenProvider.GetAccessToken()

The Access token used to establish the connection will be dumped to the console window where you can copy it, or you can simply pipe the output to the “clip” app. An alternative approach that leverages the ADAL methods is outlined below. Here, we leverage the built-in Microsoft Exchange REST API Based Powershell application (clientID fb78d390-0c51-40cd-8e17-fdbfab77341b), which is readily available in every Exchange Online tenant.

#using the built-in app and ADAL
$authContext3 = New-Object "Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext" -ArgumentList "https://login.windows.net/michevdev3.onmicrosoft.com"
$plat = New-Object Microsoft.IdentityModel.Clients.ActiveDirectory.PlatformParameters -ArgumentList "Auto"
$authenticationResult = $authContext3.AcquireTokenAsync("https://outlook.office365.com", "fb78d390-0c51-40cd-8e17-fdbfab77341b", "https://login.microsoftonline.com/common/oauth2/nativeclient",$plat);

$token = $authenticationResult.Result.AccessToken

Yet another alternative is to use your own app, provided it has the Exchange.Manage scope granted, which is needed to establish a management session as we’ve detailed here. The example below uses the MSAL methods with our custom app:

#using your own app and MSAL
$app2 = [Microsoft.Identity.Client.PublicClientApplicationBuilder]::Create("59bd1626-9fc5-4d90-bf64-4b5ec0b5532b").WithTenantId("fd796243-170d-472b-87ef-90ab72727bdd").WithRedirectUri("https://ExOPSApp").WithBroker().Build

$Scopes = New-Object System.Collections.Generic.List[string]
$Scope = "https://outlook.office365.com/.default"
$Scopes.Add($Scope)

$token = $app2.Invoke().AcquireTokenInteractive($Scopes).ExecuteAsync().Result

The method for obtaining the token is not that important, as long as the token itself corresponds to a valid user and has the necessary scopes included. It’s also important to understand that the user must be authorized to perform Exchange Online or Azure AD management tasks, this is not an elevation of privileges exploit.

Demonstrating the exploit

Now that we have the means of authenticating against the service (a valid access token), we can demonstrate the issue by executing some management tasks. We can use either the cmdlets from the Exchange Online PowerShell module, or the InvokeCommand method (read here for a refresher). The issue is not with the PowerShell module itself, but the underlying implementation.

As already mentioned above, we trigger the issue by providing an incorrect value for the mailbox anchor. For example, instead of using the UPN of the user we obtained a token for, we can provide a UPN value matching a user from another tenant. To do this via PowerShell, use the Connect-ExchangeOnline cmdlet with the –AccessToken parameter and pass the wrong -UserPrincipalName value:

Connect-ExchangeOnline -AccessToken $token -UserPrincipalName user@domain.com

Note that the connection is successful despite the wrong value provided for the –UserPrincipalName parameter, which in turn translates into a wrong mailbox anchor value. While not advisable, you can technically decode the access token and fetch the corresponding user identifiers from it, so at least some basic checks could’ve been implemented at this step. But as mentioned already, the issue is not with the PowerShell module itself, so let’s move on.

Once connected, you can execute most Get- cmdlets with no issues. On the other hand, cmdlets that modify objects will throw an error. In the example below, we tried to modify the properties of a mailbox via the Set-Mailbox cmdlet, only to be greeted with “insufficient permissions” error. Interestingly enough, both ExODS and Azure AD responses indicate lack of permissions – we will see why this is important in the next section. We’ve also included an example of running the Search-AdminAuditLog cmdlet, which throws a different type of warning/error:

The important thing is that neither operation succeeded, and if we ignore the fancy error messages, we can conclude that everything is working as expected. We provided incorrect connectivity information, and in effect we’re prevented from making changes. All good, right?

Where things get interesting – Azure AD dual-write

Things would’ve turned alright if it weren’t for the Azure AD dual-write model. See, the examples we used above are “pure” Exchange ones. We deliberate choose to (try to) modify a property that only exists within ExODS (RetainDeletedItemsFor is not replicated to Azure AD), and thus no dual-write was attempted. Now, let’s see what happens if we try to modify a property replicated to Azure AD.

Look closely. We start with an Exchange Online object and check its DisplayName value. We then try to modify said value via the Set-Mailbox cmdlet. We again get an error message, in fact the exact same message we got when we tried to modify an Exchange-only property. But here’s where it gets wild – if you check the same attribute’s value a minute later, it’s now changed!?!

In effect, the Exchange Online operation failed, however the Azure AD operation was still attempted, and ended up executing successfully. This in turn triggered a sync operation to ExODS, which now shows the updated value. Crazy stuff!

We can get further evidence of what just happened by examining the audit log. On Exchange Online’s side, the log will not contain any event related to the above cmdlet execution. As far as ExO is concerned, the operation failed, and failed executions are not audited.

On Azure AD’s side however, we do get some events. This confirms that Azure AD was indeed the perpetrator in this case, and is responsible for the change being actually committed.

It gets even weirder

If you think the behavior detailed above was puzzling, wait till you see what happens when we try to create a new (shared) mailbox. To that end, we can use the New-Mailbox cmdlet with the relevant parameters. The examples in this section are run in a different tenant, just to illustrate that the issue is easy to reproduce and does not depend on the tenant’s configuration. As before, we’re presented with an error message upon executing the cmdlet:

New-Mailbox -Shared -Name "testAudit"

Unfortunately, things on Azure AD side look quite different. As with the previous example, the cmdlet was actually executed and the corresponding user object created. In this scenario however, it looks like some additional checks kick in, as the changes are then immediately reverted, causing the deletion of the user object. A quick check against the Azure AD audit log confirms that:

As further evidence, the now deleted user object can be found in the Azure AD recycle bin:

This behavior unfortunately signals that we can also use this method to run “destructive” cmdlets, such as deleting a user. In the next example, we run the Remove-MailUser cmdlet to delete a Guest user object. As usual, the cmdlet execution on Exchange Online’s side fails, but this doesn’t prevent the dual-write model from creating a mess:

Indeed, rerunning the Get-MailUser cmdlet in a minute reveals the object no longer exists in ExO, and we can get further confirmation on this from Azure AD:

And the best part is – we get very limited audit information about all this. Zero in fact, in the case of Exchange Online. Let’s talk about this next.

Another audit failure

In the beginning of the article I complained about the already existing shortcoming of the dual-write model, when it comes to auditing. As a reminder, any action performed on Azure AD side will be executed under a single service principal object, without ever relaying any information about who the original caller of the cmdlet was. Microsoft acknowledges this issue for example here and we already saw some examples on how this looks like in the previous section. Usually you can get the corresponding event from the Exchange admin audit log/Unified audit log and obtain the caller details. But as we saw above, in this scenario cmdlets on Exchange side fail to execute. Guess what happens to the audit record?

We already illustrated what the audit logs look like for non-destructive cmdlets. More interesting examples used the New-Mailbox and Remove-MailUser cmdlets, which as mentioned above were run in a different tenant in an attempt to illustrate the issue can easily be reproduced. So let’s look at what audit trail exists within said tenant. Below is a screenshot of the exported Unified audit log trail, combining events from Exchange Online (RecordType 1) and Azure AD (RecordType 8), with timeframe covering few additional days to account for any “straggler” entries. Note that “system” events are excluded.

Even after waiting few days and covering the broadest set of workloads possible, we end up with zero information about who executed the highlighted cmdlets. As mentioned several times already, the ExO Audit record will contain no trace of the cmdlet execution, as those cmdlets actually failed. The AAD Audit record does include events corresponding to the operations executed by the Microsoft Substrate Management principal (ServicePrincipal_addc2e3e-7486-4761-8a3c-8d0f28e530e6), highlighted above, but nowhere in said record are you able to get information about who really called said cmdlets. In effect, we now have a method to run Exchange Online cmdlets “incognito”.

To be on the safe side, I’ve rerun the audit queries multiple times, across multiple endpoints (including the Compliance center UI and MDCA, which in some cases offers additional insights). As mentioned above, I’ve also reproduced this across multiple tenants, and the results are consistent. No audit records will be generated on Exchange side and whatever audit records are available in Azure AD cannot be attributed to any specific user. And that’s definitely a problem.

Not a PowerShell issue – REST example

Speaking of reproducing this, it’s worth re-iterating that the issue is not with the Connect-ExchangeOnline cmdlet or the ExO PowerShell module itself. In fact, you can reproduce it completely bypassing PowerShell and still observe the same behavior. It’s the underlying APIs and methods used by the dual-write mechanism that cause this, although few blocks can probably be implemented on Exchange side, such as verifying the mailbox anchor value or calculating a value based on the access token provided. Or even leveraging the default value, which is in the form of SystemMailbox{bb558c35-97f1-4cb9-8ff7-d53741dc928c}@tenant.onmicrosoft.com, and readily exists in each tenant.

But I digress. Here’s how to reproduce this issue with direct HTTPS requests. You will of course still need to handle authentication, and any of the methods to obtain an access token we discussed above can help with this part. Once a token is obtained, add it to your request headers, where you’ll also need to specify the mailbox anchor value, via the X-AnchorMailbox header. Remember to use a value that does NOT match the token’s user claim. The format is as follows: X-AnchorMailbox = “UPN:user@domain.com”.

Apart from the headers, you will also need to provide a request body, which contains the actual cmdlet to be executed. If you need additional details on the method, you can check this article. Lastly, the request itself should be sent to the https://outlook.office365.com/adminapi/beta/ endpoint. Here’s a full example:

#Test direct REST query

$body = @{
CmdletInput = @{
CmdletName="New-Mailbox"
Parameters=@{
Name="testREST"
Shared=$true}
}
}

$authHeader1 = @{
'Content-Type'='application\json'
'Authorization'="Bearer $($token)"
'X-ResponseFormat'= "json"
'X-AnchorMailbox' = "UPN:pesho@sts.michev.info" #must NOT match the token value
}

$uri = "https://outlook.office365.com/adminapi/beta/923712ba-352a-4eda-bece-09d0684d0cfb/InvokeCommand" #tenantID must match the token claim

$res = Invoke-WebRequest -Method POST -Uri $uri -Headers $authHeader1 -Body ($body | ConvertTo-Json -Depth 5) -Verbose -Debug -ContentType "application/json;charset=utf-8"

($res.Content | ConvertFrom-Json).Value

And here is the result. The exact same error as experienced with the cmdlets from the Exchange Online PowerShell module is thrown, and the operation itself seems to fail. Yet, on the backend the dual-write process will trigger again, resulting in the already detailed behavior – a new user object is provisioned and then removed, with the corresponding trail in the Azure AD audit log (and complete lack of audit records on Exchange side). In fact the exerts from the audit logs included above already contain the entries corresponding to this example, where a user testREST was provisioned/removed, so we will not include them again.

Some closing thoughts

In summary, we’ve uncovered an oversight on Microsoft’s side which combined with some of the shortcomings of the Azure AD dual-write model effectively allowed us to run Exchange Online cmdlets incognito, without proper audit trail. The issue was readily reproducible across tenants and did not require any additional configuration. Luckily, it could not be used to elevate privileges or bypass existing controls in the service. It took Microsoft nearly an year to address all the intricacies involved in the underlying infrastructure. I’m only publishing these findings now that I’ve received confirmation that all the relevant code changes have been fully rolled out across the service.

With all the above in mind, let’s go back to my original point of Microsoft not paying enough attention to one of its core workloads. Here “attention” means not only proper code practices and actual testing, but continuous investment in the workload. Over the past few years we’ve seen a steady trend of deprioritizing Exchange Online in favor of new and flashier workloads, and sadly that involves not just marketing and sales resources but also a lot of seasoned talent that was intimately familiar with the product, all now moving into different directions. Which is quite worrying, given the importance of Exchange Online within Microsoft 365.

P.S. This article was supposed to be published back in December 2022, that’s before the now infamous Storm-0558 hack. Who knows how deep the rabbit hole really goes…