Introducing a new series
There is no doubt that we live in interesting times as the world of IT changes with faster pace than ever. Cloud, DevOps, continuous integration/deployment – it’s all so exciting. When it works. There is no denying that it does work most of the time, at least in Microsoft land. Unfortunately, it’s not that uncommon to have a code change that brings undesired results hit production environments, and when that happens in a service of the scale of Office 365, things can get really ugly. One such example was the recent issue that exposed customer’s data in the Office 365 Admin Center Reports: https://www.petri.com/data-breach-office-365-admin-center
Sadly, that’s just one of many examples, and certainly not a precedent. In fact, so many “easy to spot” issues have been made it to different parts of the service, the Office suite, desktop, mobile and server OSes, that at times it makes you wonder whether Microsoft engineers perform any actual QA testing anymore. We might be using “testing in production” as a joke, but when something like the aforementioned incident happens, it can have very serious repercussions. But that’s not the point of this article (soon to be series)!
The point is, it is important that we as customers (colleagues, experts, MVPs, insert_noun_here) keep Microsoft (and other companies) in check, and keep them honest. As a big proponent of communication “openness”, at times I do not feel like that some of these issues are handled properly by Microsoft, thus I plan to do my part and highlight any newly occurring incidents that are clearly a mistake that should not have made it to production and that could’ve been avoided by proper testing. An “incident log” if you will, or a record of shameful events 🙂
Example #1 – Office applications signed with incorrect certificate
So, for the first article in the series, let’s talk about the shameful incident that pushed incorrectly signed executable files to Office users with the July 27, 2017 updates. This out-of-band release incorporated some very important fixes for security issues with Outlook and ironically, resulted in causing security-related issues. Namely, the executable files delivered as part of this update, such as Outlook.exe or WinWord.exe, have all been signed with Microsoft’s internal, TEST certificate that chained up to the untrusted “Microsoft Testing Root Certificate Authority 2010” root, as shown on the images below:
The issue was immediately caught and reported on the various communities, as it prevented Office applications to run in AppLocker-protected environments, caused AV software (of course, not Windows Defender) and other software that verifies code signatures to report those applications as untrusted and so on. The next day, an update was released that addressed this issue:
Leaving aside any speculations on how such an obvious mistake can make it through all validations rings (which supposedly exist?), what’s even more mind-boggling in this situation, to me at least, is that Microsoft did NOT release an update for other channels. Thus, for most enterprise environments, very few of which will be running on Current channel, the issue is still present and working around it requires to either disable features such as AppLocker or add this internal to Microsoft Root certificate in your Trusted Roots store. Far from ideal, if you ask me.
Now, one can argue that the Deferred channels should only receive security updates and in this case the update is clearly marked as “non-security”, as seen from the above screenshot. The question here is, should an obvious mistake on Microsoft’s side, and one that has serious implications of productivity (we can also argue about some security implications), be allowed to persist for weeks, or even month? I’ll leave the answer to you…
Instead of conclusion
So, there we have it, the first example in what will most likely turn into a series of blog post to cover unfortunate incidents that could’ve been avoided in an ideal world. It’s understandable that Microsoft representatives don’t usually want to talk about such issues, as they can result in some bad exposure. I’d also agree that we are all humans, we all make mistakes and so pointing fingers doesn’t do much good. It is my firm belief however that the majority of Microsoft’s customers can be understanding and forgiving, after all how many of us can even imagine the complexity of running things at such scale? So I’d urge a more open approach to handling such issues. Plus, in general being open about things is always preferred to the alternative of leaving the impression that you are trying to sweep things under the rug.
It is also my belief that such issues should be properly acknowledged and acted upon, so that we as the customers are assured that a lesson has been learned and improvements are planned (even in cases where no word has reached the outside world, which I’m sure also happens). Thus, I reserve my right to annoy people at Microsoft next time I run into an issue we’ve already reported and was supposedly acted upon. And that’s the whole idea behind this article (series) – keep Microsoft honest, make sure they follow their own procedures and best practices, for both our and their benefit!