Report on externally shared files in Microsoft 365 via the Graph API

Back in 2019, I set out to write a “proof of concept” PowerShell script that leverages the Graph API methods to report on externally shared items within OneDrive for Business sites. While the script did what it said, it was far from perfect, due to the numerous limitations of the API at that time. An updated version of the script was released back in 2022 to address some changes Microsoft made to the way permissions are returned, and improve authentication. The core logic of the script remained mostly unchanged.

Regardless, I continue receiving request from people to adapt the script to additional scenarios, such as covering SharePoint Online sites as well, or being able to run it against a specific set of users/sites. And, since Microsoft added some meaningful improvements to the Graph API in the meantime, it is finally time to release a new version of the script. Rejoice!

Noticeable changes in this release

First and foremost, we can now LIST all sites (collections) within the tenant via the Graph API, including any OneDrive for Business ones. This in turn means that the script no longer needs to query the set of users and then check for the existence of associated ODFB drive. The script now also features the –Sites parameter, which allows you to run it against the list of specified sites only. This is actually the recommended method for running the script, as it should be less prone to throttling issues.

The next big change is that the script will cover any additional lists found in the site(s), whereas the previous version only focused on the good old “Shared documents” one. Not only this addresses one of the gaps, but also allows us to fetch items in a different manner, to be discussed later. Do note the script will not cover system list, such as the Preservation Hold Library, which the service interestingly enough does allow you to share items from. Bet you didn’t know about this one, yeah? 🙂

In previous versions of the script, we retrieved items by fetching the root drive, then covering any children underneath. The process was repeated iteratively for each “nested” level, down to the “depth” defined via the corresponding parameter. This was effectively used as anti-throttling control, ensuring we only get a subset of the items to process, but has the side effect of not covering files nested few levels deep if you overlook the default parameter values. The same method is leveraged in this version too, but I’m also providing a variation that fetches list and list items instead. We will cover this in more detail later.

Lastly, you will notice the improved output, now handled via the ImportExcel module. If said module is not available, you get a CSV file instead. In both cases, the script logic has been adjusted to better align with the way Microsoft is reporting on shared items in the UI or via the downloadable sharing report. We will talk about the corresponding switch parameters in a bit. First, let’s introduce the script’s logic.

How the script works

As with most things Graph, we need to handle authentication first. The script only supports application permissions/client credentials flow, as this is the only way to ensure proper coverage across all sites within a tenant. While delegate permissions can also work, a query using such will only return items to which the corresponding user (under whose identity the script will be running) is able to access. You will likely want to add said user as Site collection administrator to each of the sites you want to cover.

With that in mind, the Sites.Read.All permissions is required to return all sharing details. If you’re worried about granting such broad permissions, you can leverage the Sites.Selected scoping method, or the newly introduced granular item-level scopes we discussed in this article. In addition, the script can leverage Directory.Read.All permissions to fetch the set of domains registered in the tenant, which will come handy when determining whether a given item has been shared externally.

Obtaining a valid access token as well as renewing it is handled by a helper function. Another helper function “wraps” Graph API requests to ensure we are handling token renewal, throttling and any potential errors. I want to stress out again that this is NOT a proper production-ready solution! There are a lot of scenarios the script’s logic does not account for, so make sure to review the code and amend it needed, especially when running against large tenants with millions of files.

After a valid token is obtained, the first thing the script does it to fetch the set of validated domains. As mentioned above, this list will be used to determine whether any of the items is shared externally. Should this call fail, perhaps due to lack of permissions, the external check will not be performed. Alternatively, you can “hardcode” the list of domains by populating it as part of the $domains variable, which you can find on line 252 (or 321 for the “classic” version).

Next, the script will fetch the set of sites. If you used the –Sites parameter, each of the supplied entries will be examined and if a valid match is found, used to limit the script’s scope to just the matched site collections. If the –Sites parameter was not invoked, the full set of sites found in the tenant will be covered, including any OneDrive for Business ones (if you provide the relevant parameter, that is).

For each of the sites, the set of lists is retrieved and processed next. As hinted above, a new method is used here, which is to fetch all list items (in batches of 5000) and process them in one go. While this is the “correct” way to handle things, it is also prone to performance issues, depending on the number of items found. Your mileage will vary, and if you prefer, you can use the “classic” method instead, which retrieved items based on their “depth” in the folder structure, starting from the root. This is indeed the reason why the script comes in two separate variants.

Anyway, after fetching the list of items, we proceed to determine whether any of them is shared, and if so gather some more details, most importantly their permissions. Unfortunately, this remains the slowest part of the script, as each item needs to be queried individually. As a simple anti-throttling control, the script will pause every 100 iterations. While this seems to be enough to cover ~100k items (~10k shared) in my own tenant, your mileage will vary.

The last bit of code handles the output. If you have the ImportExcel module installed, the script will generate an Excel file with clickable links and conditional formatting to highlight externally shared entries. If not, you get a CSV file, which you can then process on your own. Output will also be dumped into the $varSPOSharedItems global variable, which you can then transform as you see fit before exporting, or even use it as input for another script that does removal of links, etc.

How to run the script(s)

Now that you understand what the script(s) do and the permissions required, it’s time to head over to my GitHub repo and download the version you intend to run (“classic” or new one). Once the file is downloaded, make sure to set the authentication variables (lines 242-245 or 311-314 for the “classic” version), or if you prefer, replace the whole connectivity block/function with your preferred solution. To run the script, you can leverage the following set of parameters:

  • Sites – use this parameter to provide a list of sites to run the script against. Valid values include the full site URI, the ID as reported by Graph or a path (see examples below). The parameter is optional, and if you do not provide a value for it, the script will enumerate all sites within the tenant.
  • IncludeODFBsites – switch parameter, indicates whether to enumerate and process OneDrive for Business sites. When using the Sites parameter, I’m assuming that you want all the provided sites processed, so this parameter is ignored.
  • IncludeExpired – switch parameter, indicates whether to include sharing links that have expired in the output.
  • IncludeOwner – switch parameter, indicates whether to include permission entries for Site collection administrator and secondary Site collection administrator (aka “owners”).
  • ExportToExcel – switch parameter, use it to generate a “prettier” output to Excel file, with clickable links and some conditional formatting. Requires the ImportExcel module. If you do not specify this parameter, CSV output is generated.
  • Verbose – use this parameter to show additional details as the script progresses.

Two additional parameters are available for the “classic” version of the script, as follows:

  • ExpandFolders – switch parameter, indicates whether to iteratively expand (nested) folders. If you do not include this parameter, only items found under the “root” of the drive will be included.
  • Depth – integer parameter, use it to specify the depth when processing the “folder” structure. Value of 1 will only include the items in the root of the drive; value of 2 will “expand” any folders found in the root and cover items therein (without covering items in subfolders), etc. Must be used together with the –ExpandFolders switch.

Below are some examples on how to run the script(s):

#Run the script without any parameter to process all SPO sites
.\Graph_SPO_shared_files.ps1 

#If you want to also process OneDrive for Business sites, use the parameter
.\Graph_SPO_shared_files.ps1 -IncludeODFBsites


#Use the Sites parameter to process specific sites only
.\Graph_SPO_shared_files1.ps1 -Sites "tenant.sharepoint.com,12345678-1234-1234-1234-cf16a0a8a888,12345678-1234-1234-1234-c3a1f49ff1d4","https://tenant.sharepoint.com/sites/newwwwww","tenant.sharepoint.com/sites/newwwwww"

#You can also feed the list of sites via CSV/object
.\Graph_SPO_shared_files1.ps1 -Sites (Import-CSV blabla.csv).Site -IncludeExpired -IncludeOwner

#To run the "classic" version with a folder depth of 2, use
.\Graph_SPO_shared_files.ps1 -IncludeODFBsites -ExpandFolders -Depth 3 -ExportToExcel

SPOSharedFiles3

Few notes about the output

By default, the script will generate a CSV file as output and store it in the working directory. In addition, output will also be stored in the global variable $varSPOSharedItems, in order to allow you further processing before exporting. If you have the ImportExcel module installed, you can use the -ExportToExcel switch in order to get a “prettier” version of the output. It will include clickable links to the items, in order to make it easier to review them if needed, as well as some basic conditional formatting. Of course, much more can be done with the module, so feel free to adjust the relevant code section to your needs.

The screenshots below illustrate how the output will look like for the two versions of the script. One thing I forgot to mention above is that the order in which items are displayed will also be different depending on the version, even if you are including the full “depth”. The reason for this is that list items are returned in their creation order within the list/drive (as each new list item gets an incremental Id value), whereas when using the root/children method, items are returned alphanumerically on each level.

SPOSharedFiles

SPOSharedFiles1

The following columns can be found in the generated Excel file:

  • Site – clickable URL pointing to the SPO/ODFB site collection URL.
  • SiteURL – (hidden) same as above, simply used as source for the pretty links.
  • Name – the (file) name of the item.
  • ItemType – whether this is a File or a Folder item.
  • Shared – whether the item is Shared (i.e. someone other than the Owner has access to it, internal users included).
  • ExternallyShared – whether the items is shared with External users, for example via an anonymous link. For this property to be calculated correctly, the script must know the set of domains used within the organization. Either make sure it runs with Directory.Read.All permissions or manually provide a list of domains (line 252/321 for “classic”).
  • Permissions – comma-separated sting blob representing the set of permissions stamped on the item. Each entry will be in the “role:principal” format, with additional details surfaced for sharing links. For example:edit:anonymous,read:anonymous[BlockDownloads] (Expired on: 05/30/2024 21:00:00), write:MeganB@M365x84802758.OnMicrosoft.com, owner:IsaiahL@M365x84802758.OnMicrosoft.com
  • ItemPath – the “folder” path to the item, represented by its webURL value. See note below.
  • ItemID – the Graph API identifier of the item. You can use it to run a quick query in order to retrieve additional details or perform actions such as removing permissions or deleting the item.

While I’ve tried to keep the output as close as possible between the version, few other differences can be spotted, apart from the sorting. Most importantly, the output of the “classic” script will potentially omit some items, depending on the -Depth value you specify. For example, when running it against an almost pristine “demo” tenant with depth value of 3, it returned a total of 528 shared items, whereas the “full” number of items was 532. Your mileage will vary.

Another thing to note it that the ItemPath column values will differ between the two versions, as the underlying webURL value differs depending on which endpoint you are getting it from. The rest should match between the two, as the script logic has been implemented with parity in mind. For instance, when fetching drives via the “classic” version of the script, I’ve made sure to expand the list property, as it allows us to filter out all hidden lists. Similarly, a filter based on the list template is implemented to exclude things like the Preservation Hold Library.

$SiteLists = $SiteLists.value | ? {$_.list.hidden -eq $false -and ($_.list.template -eq "documentLibrary" -or $_.list.template -eq "mySiteDocumentLibrary")}

Both script versions also account for the recently introduced item-level permissions. Such entries are readily available under the /permissions endpoint and use the familiar format. As only the ID of the app is exposed, and not any human-readable identifier, the output will list such entries as “write:de8bc8b5-d9f9-48b1-a8ad-b748da725064“. If that bothers you, feel free to update the code to “resolve” the entry to the app name or whatever.

Lastly, the Excel version of the output file will also feature the Summary tab, giving you a breakdown of the number of shared items per site (collection), as well as the number of externally shared items. The numbers will respect the parameters you run the script with, i.e. expired and owner entries will be stripped out unless you specify the relevant switches.

Additional notes and summary

Before closing the article, I want to express my disappointment in the current state of the Graph API when it comes to the scenario at hand. There have been almost no improvements over the past few years, and we still have no method to filter only files with broken permission inheritance/files that have been shared. Similarly, there is still no way to answer the simple “give me a list of all items user X has access to” question. And then there is throttling…

Oh boy, there’s no way of putting this gently… it’s bad. The generic guidance gives you the overall expectations, i.e. 4 or less requests against the /permissions endpoint per second, in the worst case scenario. What it does not tell you that even if you code your solution to account for Microsoft’s throttling guidance, your experience will still be all over the place. When testing the script, not once did I saw a 429 response. Yet I had numerous instances when the script failed with a generic error such as the ones listed below:

  • A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
  • Service Unavailable.
  • GatewayTimeout.

Rerunning the script a minute later worked like a charm, even in tenants where the runtime exceeded an hour. And without making any changes to the code itself, not even adjusting the Start-Sleep timeouts I use as simple anti-throttling control. So, be prepared for such behavior, and consider processing sites few at a time, leveraging the –Sites parameter.

Another thing to note is that you should not expect full parity with the built-in sharing report you can generate via the UI. For one, said report lists each permission entry on separate line, so you have multiple entries per item. It also includes items from hidden lists, and even the PHL. While you can certainly share items from the latter, they cannot be accessed externally.

Yet another important detail is that the inheritedFrom property is aways empty, if at all present. This doesn’t seem to be a problem though, as the Graph API always seems to show the resultant set of permissions, including ones inherited from the parent. For example, if you share a folder, every item within it will have the “shared” facet, even if the item is not directly shared. In contrast, if you look at the item via the UI (or the built-in report), the parent folder is the only entry showing as “shared”.

Interestingly, I’ve run into several examples where items listed as shared in the built-in report do not show a “shared” facet or any non-default permission entry when examined via the Graph API. An example of one such item is shown on the picture below. Note the missing link, missing direct access entry and the suspicious absence of grantedToV2 property?!

SPOSharedFiles2

Upload links are also not returned by the Graph (those are links with type value of createOnly). TL;DR, do not expect parity with the built-in reports, or even with the UI, as the Graph API methods still have some deficiencies.

In summary, in this article I presented you with not one, but two updated versions of the “report on externally shared files” script. Both versions support SPO and ODFB sites and process individual sites when invoked via the -Sites parameter. The “classic” version uses a file-system oriented approach and enumerates drives and drive items, iteratively processing “nested” items starting from the root. The new version enumerates lists and list items instead, while expanding their driveItem value to fetch the relevant details. It should be a bit faster, but more prone to throttling issues, as it processes the entire content of the library in one go.

Which is not to say the “classic” version is immune from throttling, and other “random” issues. I’d recommend you run the  scripts for specific site(s) only, unless your tenant is rather small. To give you some idea, processing ~10k shared items takes a bit less than 90 minutes, so roughly 2 items per second. In theory you should be able to hit 2x that speed before throttling becomes an issue… but in practice Graph still leaves a lot to be desired when addressing this specific scenario.

Anyway, as usual, let me know if you run into any issues with the script, especially ones related to handling the 429 response or token renewal. If you plan to run this in production, make sure to improve the overall error handling. I still view this as a sample to learn from, and not a full-blown solution, so use at your own peril. Enjoy!

5 thoughts on “Report on externally shared files in Microsoft 365 via the Graph API

  1. Thilo says:

    Hi Vasil,
    thank you very much for this script. This solves a problem that I’ve had on my table for a long time.
    Do you have an idea, why it does not show a “Full-Access”-permission I’ve set on the “classic” SharePoint permission-page?
    I ended the inheritance there and manually activated “full access” for one user and “read” for another. The “read” user appears in the csv, the “full access” user does not.

    Reply
    1. Vasil Michev says:

      Did you run the script with the -IncludeOwner switch? The output strips such entries and they can get noisy, if you want them included, use the -IncludeOwner switch.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.