Non-existent users show up in SignInActivity data (or how logs continue to disappoint)

One of the common questions I get nowadays is “give me a list of all users that haven’t logged in in the past XX days”, or variations of the same theme. The question is somewhat easier to answer nowadays, thanks to the SignInActivity property, as we have covered in previous articles. In most cases, I’m able to quickly answer with something like the below example (assuming the user is interested in getting this data via PowerShell):

Get-MgUser -Filter "signInActivity/lastSignInDateTime le $([datetime]::UtcNow.AddDays(-60).ToString("s"))Z" -All:$true | Export-CSV -nti D:\temp\blabla.csv

While this method is quick and easy to use, it’s not without its issues. For example, as the data is fetched via the Audit log, you can only leverage this method in tenants with Azure AD Premium SKUs, otherwise you get the beloved “Neither tenant is B2C or tenant doesn’t have premium license” error message. In addition, the SignInActivity property currently does not distinguish between successful and failed login attempts, which means the results might not be exactly what you are looking for. And of course there’s the issue with how the Graph SDK for PowerShell handles output, which usually requires some  additional adjustments.

Some unexpected output

When preparing an example earlier today a surprise was waiting for me! I was playing with a version of the cmdlet above that exposes the actual value of the lastSignInDateTime property (thanks Graph folks for making this more complicated that it has to be), and I noticed some interesting entries at the bottom of the output.  Here’s how the example looked like:

Get-MgUser -Filter "signInActivity/lastSignInDateTime le $([datetime]::UtcNow.AddDays(-30).ToString("s"))Z" -All:$true -Property DisplayName,Id,UserPrincipalName,SignInActivity | select DisplayName,Id,UserPrincipalName,@{n="LastLogin";e={$_.SignInActivity.lastSignInDateTime}}
SignInActivityIssue
Some strange objects returned, note the lack of UPN

Note the two entries at the bottom, which lack some of the detail. Now, for those of you used to the… challenges with working with output in the Graph SDK for PowerShell, this might look like one of the common issues, i.e. forgetting to specifically list a property. Yet, same properties are populated for the rest of the entries returned. The “raw” data also lacked values for DisplayName and UserPrincipalName, as well as any other property I thought of trying. And the same results were observed via the Graph explorer tool, so definitely not a PowerShell issue.

SignInActivityIssue1
Graph explorer returns the same data

So, in effect, it looks like we have some sign-in event entries which correspond to users without an UPN?! OK, there are objects within Azure AD that do not have an UPN value, and some of them can even generate sign-in events (*cough* service principals *cough*). So are we perhaps seeing entries that correspond to such objects? Unfortunately, this is not the case. In fact, the two GUIDs above cannot be resolved against *any* object within the directory:

SignInActivityIssue3
GUIDs cannot be matched against any object within the tenant

Looks like we have a mystery at hand!

A refresher on SignInActivity

To get to the bottom of this, let’s first do a refresher on how queries against the SignInActivity property actually work. As mentioned above, the data is fetched from the Azure AD reporting service datamart, which also explains the need for AuditLog.Read.All permissions for queries against said property. The data is then “enriched” with all the remaining properties from the Graph store. In fact, this is easy to illustrate via the $WhatIf query parameter:

SignInActivityIssue2
$WhatIf shows us how the query is interpreted on the backend

This in turn allows us to point a finger at the Azure AD reporting service as the likely cause of the issue. In order to confirm this, we can simply query the URI listed above and look at the data returned from the AAD reporting workload. While the Graph explorer will not allow you to query URIs not starting with graph.microsoft.com, nothing is stopping us from doing this via PowerShell. Here are the (trimmed) results:

$uri = 'https://reportingservice.activedirectory.windowsazure.com/users?$filter=signInActivity%2flastSignInDateTime+le+2023-10-15T09%3a11%3a46Z'
$res = Invoke-WebRequest -Uri $uri -Headers $authHeader1
($res.Content | ConvertFrom-Json).value

id signInActivity
-- --------------
34ea864d-8765-45da-8b15-df67cf2fc547 @{lastSignInDateTime=2023-04-06T02:10:21Z; lastSignInRequestId=8686dcd1-b17d-48fd-b40c-a67fae9c2f01; lastNonInteractiveSignInDateTime=2023-11-14T01:09:23Z; lastNonInteractiveSignInRequestId=8ae743a3-a549-40f5-a4ea-ad4db70a6c00; lastSuccessfulSignInDateTime=; lastSucce...
... ...
05fa5bac-b2d8-4952-800e-1e2d480e4d45 @{lastSignInDateTime=2023-05-04T16:59:39Z; lastSignInRequestId=68c610f5-d9cd-4002-b19c-5e4026620f00; lastNonInteractiveSignInDateTime=2023-05-04T16:59:41Z; lastNonInteractiveSignInRequestId=5eb511d5-79df-4b61-9360-15685c387600; lastSuccessfulSignInDateTime=; lastSucce...
d0f55267-a08c-4270-91aa-8e925b31aeb7 @{lastSignInDateTime=2023-09-15T05:52:28Z; lastSignInRequestId=5ca9a766-1898-45ee-99a4-c1bdcdee7900; lastNonInteractiveSignInDateTime=; lastNonInteractiveSignInRequestId=; lastSuccessfulSignInDateTime=; lastSuccessfulSignInRequestId=}
5d0530f4-3d05-477a-adaa-1d1086ec5c86 @{lastSignInDateTime=2023-09-28T13:27:12Z; lastSignInRequestId=cbb5b123-6ce3-4cfd-8835-ec18df934000; lastNonInteractiveSignInDateTime=; lastNonInteractiveSignInRequestId=; lastSuccessfulSignInDateTime=; lastSuccessfulSignInRequestId=}

We can run variations of the query, with or without the $select and/or $filter parameters – in all cases we get the two entries at the bottom. So it seems that now we have a clue – the query against the AAD reporting service is returning two “extra” entries, which the Graph query then fails to “enrich” with any additional attributes, because it cannot find a matching object. In fact, we can easily confirm this by removing the SignInActivity property from our (Graph) query, which in turn changes how the request is processed (you can also use $whatIf to see the difference). Another interesting fact – if you drop the filter, the first query will be made against the AAD Graph store, and only then the SignInActivity property will be added, again resulting in the two “extra” entries not being added to the output.

Another interesting find – lastSuccessfulSignInDateTime

Now, we might have pinpointed the likely origin of the issue, but we are yet to answer what those two objects actually are. As we cannot find a matching object within Azure AD, this poses to be a challenging tasks. One idea to work around this would be to fetch additional details via the AAD reporting service workload – after all, it is the one giving us those “extra” entries to begin with. Sadly, the service only ever returns two properties: id and signInActivity, as evident from the $medatada document.

$uri = 'https://reportingservice.activedirectory.windowsazure.com/$metadata'
$res = Invoke-WebRequest -Uri $uri -Headers $authHeader1
$metadata = [[xml]]$res.Content

$metadata.Edmx.DataServices.Schema[0].ChildNodes

Name Key Property
---- --- --------
UserSignInActivity Key {id, signInActivity}
SignInActivity {lastSignInDateTime, lastSignInRequestId, lastNonInteractiveSignInDateTime, lastNonInteractiveSignInRequestId, lastSuccessfulSignInDateTime, lastSuccessfulSignInRequestId}

Before accusing me of wasting your time with useless details, do note the last two properties (in case you missed them in the previous examples): lastSuccessfulSignInDateTime and lastSuccessfulSignInRequestId. As mentioned above, one of the biggest issue with the SignInActivity property currently is that it does not distinguish between successful and failed login attempts. Well, the metadata indicates that Microsoft is eventually going to release an update that will bring this functionality – you heard it here first! Sadly, both properties have null values across all users currently, so we have to wait a bit more. Hopefully, for less than it took for SignInActivity to mature to /v1.0.

Turning to the audit logs and MCAS

None of this however helps us solve the current mystery. As we cannot rely on the directory services endpoints to give us a clue here, our only option remains to try and find some additional clues on the auditing front. In a stroke of bad luck, both objects have their latest sign-in event happen longer than 30 days ago, so we cannot leverage the Azure AD sign-in logs directly. However, we can leverage the Unified Audit log datamart, where such events are eventually ingested, and kept for an extended duration. Let’s see if the good old Search-UnifiedAuditLog cmdlet can save the day!

$res1 = Search-UnifiedAuditLog -StartDate (Get-Date).AddDays(-90) -EndDate (Get-Date).AddDays(1) -FreeText 5d0530f4-3d05-477a-adaa-1d1086ec5c86

$res2 = Search-UnifiedAuditLog -StartDate (Get-Date).AddDays(-90) -EndDate (Get-Date).AddDays(1) -FreeText d0f55267-a08c-4270-91aa-8e925b31aeb7

Both queries did return a handful of results, all timestamped within a minute or two of the datetime reported by the signInActivity property. All the events closely resemble the same pattern: a sign-in attempt coming from my home IP, using the same client app, tagged as “internal” to my tenant. All events also hint to the involvement of Azure MFA (“SAS:EndAuth” or “SAS:BeginAuth” RequestType value). They differ between the application being accessed, Teams vs SharePoint Online, but that’s minor detail. Unfortunately, the Actor data exposed within the audit log entries matches the “unknown” GUIDs reported by the AAD service, so we’re not making much progress here.

When it comes audit log events however, there’s always one more trick we can use. For year, we have known that MCAS/Defender for cloud apps ingests audit events via some “internal” connector, which in turn always seem to expose some additional details, not otherwise found anywhere else. So in this case, I went to check what MCAS can reveal about those two GUIDs. Checking under Assets > Identities quickly reveals the two objects as “known” and “internal” account (but not a user!), but sadly doesn’t reveal any additional detail. Switching to the Activity log though, we are able to find just what we are looking for:

SignInActivityIssue4
Activity logs data from Defender for Cloud apps gives us a clue

Those are the same events as returned from the Unified audit log, but “enriched” with some additional data, only exposed via MCAS. Unlike all the other methods we tried above, the MCAS audit log entry is giving us the HomeTenantUserObjectId property, which (surprise surprise) matches the GUID of my own user. It also gives us the TenantId, but we already knew it matches my own tenant. And, the UserPrincipalObjectID value matches the GUID returned by the AAD reporting service, so now we have a mapping between it and the “real” GUID. Unsurprisingly, in both cases the same value is returned for HomeTenantUserObjectId, even though the UserPrincipalObjectID values differ.

Note the other two highlighted entries though, we will come back to them later on, as they indicate another failure…

Mystery finally revealed

So now we finally have some evidence that those “extra” entries correspond to existing users within my tenant, but with the wrong GUID. Hm, I wonder what scenario would correspond to my user being represented by another GUID? You guessed it – Guest access! Unfortunately, even MCAS fails short on helping us confirm this, as the UserTenantId reported by it does not match any real tenant id, as you can easily confirm via an findTenantInformationByTenantId query. Instead, it seems to match the CorrelationId value for the audit log entry, which is useless.

Now that we have a clue, we can actually do some tests in other tenants. As I was asking Tony to check the presence of such entries in his own tenant, he was able to “resolve” one of the “erroneous” GUIDs from my tenant to a user within his:

Vasil Michev (MVP) 5d0530f4-3d05-477a-adaa-1d1086ec5c86 vasil@michev.info vasil_michev.info#EXT#@RedmondAssociates.on…

Eureka! Of course, in real life you might not have the admin of the other tenant at your disposal, but we can leverage another trick. Namely, connect to another tenant as a Guest user via PowerShell (or the Graph explorer). The method might not work in tenants that block access to said tools or restrict the use to internal users only, but in general you can do this:

Connect-MgGraph -TenantId microsoft.onmicrosoft.com
Get-MgUser -UserId d0f55267-a08c-4270-91aa-8e925b31aeb7

DisplayName Id Mail UserPrincipalName
----------- -- ---- -----------------
Vasil Michev d0f55267-a08c-4270-91aa-8e925b31aeb7 vasil@michev.info vasil_michev.info#EXT#@microsoft.onmicrosoft.com

Voila! Both entries are now revealed and confirmed to match my own account, or more specifically its representations across other tenants. In effect, we can conclude that the Azure AD reporting service keeps track on your users’ login activities across resource tenants. Only interactive sign-ins data is kept, which answers why the timestamps we saw above are all from Sep 2023 (whereas I interact with the Microsoft tenant on a daily basis).

It is of course good to know which tenants your users interact with and how often, but realistically, getting data via this method would be challenging. While you can query the AAD reporting service for specific users, you still need to know the GUID of the user within the resource tenant (i.e. Microsoft’s), which is not something you can get on your own. In addition, the Azure AD reporting service does not support queries based on UPN values, so knowing the GUID is a hard requirement.

As an alternative, you can get such usage data by querying the Azure AD sign-in logs directly. Unless you are exporting those to an external system, the 30 days retention period is unlikely to be sufficient, as shown in the examples examined here. And you can get the non-interactive sign-in logs as well, which will give you a far more realistic estimate of the user’s activities.

Before concluding the article, I need to point out that the Azure AD sign-in logs on their own also cannot solve the mystery we examined above. They too do not expose the GUIDs in question, so you cannot correlate them directly with the output returned from the reporting service. What they do expose is the user’s Display Name value as configured in the resource tenant, but there is no way for you to know said value beforehand, and filtering on it is also not possible.

SignInActivityIssue5
Azure AD Sign-in logs give us almost all the important details, sans the user’s GUID in the resource tenant

Summary

In summary, after a crash course in troubleshooting, we now understand that the Azure AD reporting service keeps track of SignInActivity data for users accessing tenants other than your own. Only interactive sign-ins are kept in this scenario and the identity of the user is represented by its object GUID from the resource tenant. To get said data, you either need to query the AAD Reporting workload directly, or submit a Graph API query that filters users based on the SignInActivity property. Any other queries will result in the users’ guest account IDs being stripped out of the output, as the directory service workload cannot match them against an existing object within the tenant.

While we can debate the usefulness of said data and whether it poses a privacy issue, at the end of the day it is obvious that Microsoft could have handled things better. Our investigation highlights the lack of detail exposed in sign-in events, as the metadata collected therein was insufficient to match the mystery GUIDs against any existing user. Only the logs collected by MCAS/Defender for cloud apps exposed both the user’s GUIDs (the one in the home tenant and the one in the resource tenant). Talk about great experience, putting such data behind a paywall! What’s worse, even said log entries failed to give the full picture, as they do not expose the correct resource tenant ID.

At the end, it took the “call a friend” card and few additional tricks to finally get to the bottom of this. Things would have been much, much easier if there was a single log entry that revealed all the relevant details (GUIDs for the user in both tenants, with the correct home and resource tenant IDs). Then again, this wouldn’t have been a problem if the Graph query worked in a consistent manner, regardless of which parameters we use (i.e. if it didn’t spill those entries to begin with). On the bright side, at least we learned that the  lastSuccessfulSignInDateTime property is in the works!

2 thoughts on “Non-existent users show up in SignInActivity data (or how logs continue to disappoint)

  1. Paul Robichaux says:

    That was fun. Reminds me of teen horror movies where the call is coming from inside the house.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.