Before we get started, be warned – this is a rant post. While there are some interesting examples and potentially even some learnings you can find below, the post will predominantly focus on my frustration with (some of) the SharePoint Online endpoints on the Graph API. If you have more sense than I do and are avoiding these, just ignore the post.
Long story short, I’ve been working on a PowerShell script to generate “storage” report for SharePoint and OneDrive for Business, down to item-level details and with versions included. Think of it as the analog of the Storage Metrics tool you can find in the UI, but centralized. And because I am apparently a masochist, the tool is using the Graph API endpoints.
Many of the scripts/tools I’ve released previously also leverage the Graph API, and I’ve often times shared additional details on how the API behaved and what workarounds you need to implement for better results. No API is perfect, and neither are an of my scripts for that matter. But while previously I’ve had to deal with a handful of oddities, this time around half of the code is basically fixes and workarounds. Let’s go over some examples, shall we.
Let’s start with getting the set of sites for our tenant and their details. Sounds like an easy task, now that we have the LIST /sites method, no? Sure, it only supports application permissions, but that’s actually the preferred method when creating a tool like that. It gets the task done, and even includes the set of OneDrive for Business sites, so we can get everything in one go. However, there are certain discrepancies in the data we get, it seems.
One the screenshot below, two calls are made. The first one is the LIST /sites call, the results of which we save into a variable and then filter for a given siteId on the client side. The second call is querying the Graph API’s /sites endpoint for said siteId directly. Both calls are using application permissions and the same access token. Yet, there are some obvious dissimilarities in the values returned.
Now, it is not uncommon for some properties to only be returned when getting individual objects. In fact many other Graph API endpoints and PowerShell cmdlets across various modules behave in this manner. For example, in the screenshot above we can see that the lastModifiedDateTime property is only returned in the GET call. What we don’t usually see is things working the other way around, i.e. in this case we have the isPersonalSite property returned on LIST and not on GET.
We can also spot a mismatch between the values for the name property. Are we to assume that the value is only returned on the LIST call? But if that’s the case, why is the property returned at all, and not omitted, which seems to be happening with other properties? Things get even more interesting if we add delegate permissions in the mix. While you cannot use the LIST method with delegate permissions, you can GET the details of sites just fine, and here is the result:
Fun, right? Oh, but wait, this time it’s not just the name property that diverges, there is more. Did you note the displayName value? Go back and compare it to the values we got via application permissions. So, not only the result seems to depend on the method we used, but also on the permission model used. At this point, we’re better off randomly generating those values.
If I am to take a wild guess here, it looks like that for the LIST method, the name and displayName property are populated with the same value, which in turn seems to match the value of displayName as returned from the GET method. The value of the name property returned from GET seems to match the webURL path value, which also explains the null value we get for the root site collection shown above. The actual display name of the site however is only returned properly via the delegate permissions GET query.
I will not even delve on the fact that the webUrl property has a trailing slash when returned via the LIST method and no such slash when returned from GET. Or the fact that createdDateTime differs with some milliseconds in most cases…
Moving on, let’s talk folder size. If you fetch the driveItem for a folder item, you will get the size property, easy task. Even better, you will quickly be able to confirm that the size of the folder item is reported along with the size of any versions items within the folder might have. Which is great for the purposes of our current task… if it was something you can rely on. There are some exceptions to this behavior, such as Notebook items. No big deal, you can easily filter those. And then you run into more exceptions, such as folders storing video files migrated by Stream.
In the example above, we see the Videonew folder, containing a single video, albeit with three versions. Checking the folder size via the UI, or the Storage Metrics tool for that matter, gives us the size with all versions included (like it does for any other folder), whereas the Graph API call returns a number representing the size of the latest version only. For this reason, it is better to “recalculate” the size of folders after retrieving all items and their versions.
To be clear, the API does return all versions of the single item stored in the folder, and summing said versions’ size will give you the proper value. The point is that you should not rely on the size property for folder items, as there are scenarios in which it will not reflect the size of all versions. Notebooks and media files are usually involved, but plenty of cases where the size was incorrectly reported for folder hosting a single Word document were also present (see below). In fact, in my tenant I had to correct the size for no less than 300 folders. Your mileage will vary.
Another thing worth noting is that versions are treated differently between lists and drives. Yes, I know the ListItemVersion resource is technically a different thing than the DriveItemVersion one, that’s not the point. What I mean is the following. Both the /drives/{id}/items/{id}/versions and /lists/{id}/items/{id}/versions endpoints support pagination and both have default page size of 200, which you can adjust via the $top operator. However, if you go the $expand=versions route, only in the case of the former will you get the full response of 200 versions and a versions@odata.nextLink facet. Whereas with the /lists/{id}/items/{id}?$expand=versions query you seem to be capped with a total of 200 versions returned, no pagination.
As to why this is important, take a look at the screenshot below. Combine this with the aforementioned size property issue for folders, and the conclusion is that we need to get the info on each version. In an ideal world, we’d have the size property also exposed on the ListItemVersion object, allowing us to fetch all details with a single call thanks to the $expand operator. Alas, the /lists/{id}/items/{id}?$expand=versions query returns a maximum of 200 versions and does not return size.
In effect, if you want to fetch the full set of versions, you should not be using the $expand operator against a list item. In all fairness, the Graph API documentation does warn us about some known issues with $expand, albeit only in the context of directory objects. Nevertheless, it is yet another thing to be mindful of when working with versions. And to top it all off, none of the methods mentioned above support the $count operator, so you inevitably end up needing to fetch all versions even is all you want is a simple count to report on.
By far the most annoying aspect of working with versions however is the slowness and overall unreliability of the API. While the occasional Gateway Timeout error was not that uncommon even with code that only gets item details, expanding the script to cover versions degraded its reliability quite noticeably. I ended up having to reduce the page size to 100, as it seems like the only reliable way to eliminate, or at least reduce such errors.
The amount of Service Unavailable responses received is also quite depressing. Hand down the most annoying part however was running into some 400 Bad Request responses, even though the request itself was perfectly fine. And by that I mean that retrying the exact same request, without modifying a thing, yields a proper response. Infuriating! And quite the bad design on Microsoft’s side, as you would not expect 400 errors to be retriable.
It is also important to note that the issues detailed above were all encountered with some very relaxed request rate, with lots of micro delays added to avoid throttling. In some cases only a single site was polled, with just few hundred items total. In fact, not once did I receive a response that featured the Retry-After header, which we can take as indirect evidence that the rate at which the script queries the Graph API is within the acceptable limits. Probably.
As mentioned above, to account for such unpredictable behavior, I had to add a lot of additional code and change the logic to retry almost every error. We will discuss the details on that in my next article, which will be about the script itself. If I decide to release it that is, as even for a proof of concept it is nowhere near as reliable as I would have hoped for.
At the end of the day, I am utterly unimpressed with the state of the API when it comes to working with item versions. Being forced to go slow is something I can live with, but the reliability issues are not something that Microsoft should have kept ignoring for years. And yes, none of these issues are new, as you can find multiple GitHub or SO posts describing similar experiences with CSOM and SharePoint Online. And in some cases, you can even find acknowledgement that the issue was on Microsoft’s side. Only to have the same thing happen over and over again…
I cannot say I’m overly impressed with the state of the Graph API as a whole, either. Years into its journey, there is hardly a hint of a uniform experience across the various endpoints. Each endpoint comes with its own implementation, quirks and issues: different pagination, different behavior of the query operators, different error handling, different throttling… you name it. At the same time, support for admin operations is simply non existent for most workloads. Even on the “client” side parity with older APIs is still more of a wishful thinking, yet we are constantly asked switch to the Graph, regardless of any shortcomings. Someone needs to beta test things after all, and who can do that better than your paying customers? 🙂
1 thought on “My experience working with SharePoint/OneDrive for Business item versions via the Graph API”