Another Office 365 compliance issue swept under the rug

A while back, I was made aware of an interesting issue, namely the fact that OneDrive for Business users can disable indexing for their own drives, effectively disabling functionalities such as eDiscovery or DLP. This of course is yet another issue that stems from the fact that users are Site collection administrators for their own ODFB drives, and as such have full control over it. So it wasn’t that surprising, I simply haven’t heard about it before (or forgot). After verifying the issue and involving the all-knowing Mr. Tony Redmond, a response from Microsoft’s Mark Kashman was quick to follow. And now, two months later, the issue seems to has been addressed by hiding the corresponding indexing settings. Or is it?

At this point, all Microsoft did was to hide the Search and Offline availability settings from the Site settings page. The controls are actually still there, and you can simply use the direct url (_layouts/15/srchvis.aspx) to access and toggle them as you please. Moreover, Microsoft did not actually change any values – if a user has already disabled indexing, it remains disabled and thus the original issue persists. And since the actual controls haven’t been touched, other methods of toggling them such as using a CSOM-based script will work just fine.

Sadly, that’s not all. As entities such as Office 365 Groups, Teams and more recently Private channels are all utilizing an ODFB site collection at the backend, the issue affects more than just personal drives. For any of the aforementioned, one can toggle the indexing settings and experience the same behavior. The settings aren’t even hidden from the UI, as Microsoft only did so for user’s ODFB drives. Moreover, other functionalities that depend on the index might not function as well. Notably, DLP will happily ignore any files stored in such locations, regardless of how many matches on sensitive data types a given file contains. Automatic labeling will most likely have a similar behavior, once it lands.

To illustrate the issue, I’ve created a Content search that covers some sites in my tenant. The search includes the user for which I confirmed the issue originally (HuKu), and had its indexing settings unchanged after the “fix”. I also added my own ODFB site, with the intent of stopping indexing on it later, another user’s site, which will have its indexing enabled and will be used as a reference, and lastly, a Group site:

OneDrive for business compliance issueSince only one of these sites currently had indexing disabled, the Content search results included matches from three locations, as follows:

OneDrive for business compliance issueSo far, we’ve confirmed that the “fix” doesn’t change anything for sites that had indexing disabled, and they are still rendered transparent to the eDiscovery/Content search functionality. All the other sites returned matches as expected. I then proceeded to toggle the indexing settings for my personal drive and the ODFB site used by the “Default” group, and to give things a push also rebuild the index. After a while, I run the same content search, and the results are displayed below:

OneDrive for business compliance issueI’m not sure why three separate entries corresponding to the same site were returned, but the important part here is that only results from the single ODFB site that had indexing still enabled are available. In other words, this confirms that the “fix” does nothing but hide the settings page and people can still disable indexing and prevent the content of the drive from being searched in eDiscovery. Moreover, the same applies to the Office 365 Group site I had included in the search – no results returned there either. So, the issue goes well beyond personal drives. And, toggling the “include unindexed items” setting for the eDiscovery export doesn’t seem to make any difference. In effect, only the 24 items from the single drive were exported, instead of the several thousand items spread across multiple locations one would expect.

So, where does this leave us? Since the issue has existed for years and has been publicly discussed numerous times, organizations that have to comply with regulations should either put pressure on Microsoft to have an actual fix released ASAP, or take matters in their own hands and audit and fix any sites with indexing (un)intentionally disabled. We will cover this in a moment, but first I’d like to take this opportunity to issue another rant in Microsoft’s general direction.

Let’s put it like this – it has been over 8 years since Office 365 was released. That’s more than enough time to move away from the legacy code and even release a completely new architecture, aligned with the cloud reality, and with the needs of the enterprise customers in a cloud setting. Yet we continue to have solutions build on stitching different workloads together to just “make it work”. The issue described above is hardly the first problem caused by the fact that users have SC admin permissions to their sites. Won’t be the last probably either. Instead of addressing this once and for all, Microsoft is extending the reach of such issues to Office 365 Groups, Teams, Private channels and whatever other workload they decide to move to the Groups architecture next.

To be clear, the issue is not only with the generous permissions granted to end users. It’s the entire model, which is still rooted in the on-premises architecture. Take auditing for example – it took Microsoft years before they released something on that front for SharePoint Online. And now, years later, operations such as changing Site/Site collection settings are still not being audited or exposed in the Unified audit log. In fact, for the scenario above, the best you can get out of the audit log is events showing that a given user has accessed the corresponding page (_layouts/15/srchvis.aspx). Whether he has changed something there is open to interpretation as no data is presented.

So if you want to address this issue properly, you have to write your own solution. The code below would give you a list of all ODFB sites in your tenant, along with the corresponding indexing setting. To run it, you need the SPO PowerShell module (as it provides an easy way to fetch ODFB URLs), the CSOM binaries and the PnP binaries (for Modern authentication). It’s an ugly ‘proof of concept’ which lacks any error handling, so don’t expect it to do wonders.

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint.Client")
Add-Type -Path (Get-Module SharePointPnPPowerShellOnline -ListAvailable).Path.Replace("SharePointPnPPowerShellOnline.psd1","Microsoft.SharePoint.Client.dll")
[System.Reflection.Assembly]::LoadFile("C:\Users\XXXXX\AppData\Local\Apps\SharePointPnPPowerShellOnline\Modules\SharePointPnPPowerShellOnline\OfficeDevPnP.Core.dll")
[System.Reflection.Assembly]::LoadFile("C:\Users\XXXXX\AppData\Local\Apps\SharePointPnPPowerShellOnline\Modules\SharePointPnPPowerShellOnline\Microsoft.IdentityModel.Clients.ActiveDirectory.dll")

#Get indexing status for all ODFB sites
Connect-SPOService -Url "https://tenant-admin.sharepoint.com"
$ODFBSites = Get-SPOSite -IncludePersonalSite $true -Limit all -Filter "Url -like '-my.sharepoint.com/personal/'"

foreach ($ODFBSite in $ODFBSites) {
$onedriveurl = $ODFBSite.url
$authManager = new-object OfficeDevPnP.Core.AuthenticationManager
$clientContext = $authManager.GetWebLoginClientContext($onedriveurl)

$clientContext.Load($clientContext.Web)
$clientContext.ExecuteQuery()
$ODFBSite | Add-Member -MemberType NoteProperty -Name NoCrawl -Value $clientContext.Web.NoCrawl
#$clientContext.Web.NoCrawl
}
$ODFBSites | select Url, Owner, NoCrawl

Note that the value of NoCrawl should be set to False for sites that have indexing enabled, thus you should only care about sites that have it set to True. Once you have the list of sites, you can change the value using CSOM:

#Get the SC properties and disable indexing
$clientContext.Load($clientContext.Site.RootWeb.AllProperties)
$clientContext.ExecuteQuery()
$clientContext.Site.RootWeb.AllProperties.FieldValues.NoCrawl

$clientContext.Site.RootWeb.AllProperties["NoCrawl"] = "false" # true == hide from search
$clientContext.Site.RootWeb.Update()
$clientContext.ExecuteQuery()

The code above can be adapted to run against Site collections corresponding to Office 365 Groups, Teams and Private channels as needed. You will the corresponding permissions in order to make a change, so either run it as Global Admin, SPO Admin or Site collection admin for the given entity.

I’m sure that there are other, more elegant methods to report and control the indexing status, but the solution above seems to work OK for me. Chances are there are built-in PnP cmdlets for the same, but given my limited SharePoint experience I couldn’t find anything on that front. I did find it interesting that the Get-PnPSearchCrawlLog cmdlet returns “proper” results for a site with the indexing setting disabled, even after rebuilding the index. In any case, the important part is that those results are not being returned in content search/eDiscovery, which is the reason for writing this article. Hopefully Microsoft will come up with a more suitable solution, and soon!

1 thought on “Another Office 365 compliance issue swept under the rug

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.