Firefox Chief Technology Officer Eric Rescorla has written a detailed blog post explaining exactly how the browser's add-ons came to break all at once last week, how it was fixed, and how the company will avoid another 'armagadd-on' in the future.
The company already explained that the mass disablement of add-ons was due to the expiration of a signing certificate as it pushed updates and scrambled to fix the issue over a weekend.
Now, we have a lot more detail about how the certificate was able to expire, and why it affected people at different times.
Rescorla explains that Mozilla noticed the problem around 6pm PT on Friday evening, presumably just as the tech team were preparing to clock off for the weekend. At that point, not all users were affected, because "add-ons are checked about every 24 hours, with the time of the check being different for each user." Once a user's installation of Firefox initiated the check, it found the relevant signing certificate expired and disabled all add-ons signed by it -- which was most of them.
Rescorla goes into great detail about the fixes that were considered and eventually deployed, but the key question many Firefox will be looking for in his writeup is why did it take so long?
Firstly, the CTO clarifies that the team shipped a fix "at 2:44 AM, or after less than 9 hours, and then it took another 6-12 hours before most of our users had it. This is actually quite good from a standing start."
He goes on to detail the reasons fixing something like this isn't as straightforward as it might seem, including the company's own security protocol being "good practice" but "somewhat inconvenient if you want to issue a new certificate on an emergency basis."
Even now, Rescorla says not all users have received a fix -- including people using older builds. As mentioned in our previous coverage of this issue, some people intentionally stick to outdated versions for various reasons, often because a particular add-on stopped being updated after that version, or because they're using older operating systems.
Firefox says it can't offer those people a solution, and instead recommends they update to a newer, more secure version of the browser.
Finally, the post details some lessons Firefox will be taking from the whole debacle, most significantly improved tracking of potentially time-sensitive issues and a way to push urgent updates when the updating system itself isn't working.
In addition to saying the company will issue a formal 'post mortem' of the issue and its handling next week, Rescorla counters the user complaints of slowness with the comment:
"As someone who sat in the meeting where it happened, I can say that people were working incredibly hard in a tough situation and that very little time was wasted."
You can find the whole post on Mozilla's blog here. We suspect other browser teams will be reading with interest.