Improving the performance of the add-ons manager with asynchronous file I/O

The add-ons manager has a dirty secret. It uses an awful lot of synchronous file I/O. This is the kind of I/O that blocks the main thread and can cause Firefox to be janky. I’m told that that is a technical term. Asynchronous file I/O is much nicer, it means you can let the rest of the app continue to function while you wait for the I/O operation to complete. I rewrote much of the current code from scratch for Firefox 4.0 and even back then we were trying to switch to asynchronous file I/O wherever possible. But still I used mostly synchronous file I/O.

Here is the problem. For many moons we have allowed other applications to install add-ons into Firefox by dropping them into the filesystem or registry somewhere. We also have to do things like updating and installing non-restartless add-ons during startup when their files aren’t in use. And we have to know the full set of non-restartless add-ons that we are going to activate quite early in startup so the startup function for the add-ons manager has to do all those installs and a scan of the extension folders before returning back to the code startup up the browser, and that means being synchronous.

The other problem is that for the things that we could conceivably use async I/O, like installs and updates of restartless add-ons during runtime we need to use the same code for loading and parsing manifests, extracting zip files and others that we need to be synchronous during startup. So we can either write a second version that is asynchronous so we can have nice performance at runtime or use the synchronous version so we only have one version to test and maintain. Keeping things synchronous was where things fell in the end.

That’s always bugged me though. Runtime is the most important time to use asynchronous I/O. We shouldn’t be janking the browser when installing a large add-on particularly on mobile and so we have taken some steps since Firefox 4 to make parts of the code asynchronous. But there is still a bunch there.

The plan

The thing is that there isn’t actually a reason we can’t use the asynchronous I/O functions. All we have to do is make sure that startup is blocked until everything we need is complete. It’s pretty easy to spin an event loop from JavaScript to wait for asynchronous operations to complete so why not do that in the startup function and then start making things asynchronous?

Performances is pretty important for the add-ons manager startup code, the longer we spend in startup the more it hurts us. Would this switch slow things down? I assumed that there would be some losses due to other things happening during an event loop tick that otherwise wouldn’t have but that the file I/O operations should take around the same time. And here is the clever bit. Because it is asynchronous I could fire off operations to run in parallel. Why check the modification time of every file in a directory one file at a time when you can just request the times for every file and wait until they all complete?

There are really a tonne of things that could affect whether this would be faster or slower and no amount of theorising was convincing me either way and last night this had finally been bugging me for long enough that I grabbed a bottle of wine, fired up the music and threw together a prototype.

The results

It took me a few hours to switch most of the main methods to use Task.jsm, switch much of the likely hot code to use OS.File and to run in parallel where possible and generally cover all the main parts that run on every startup and when add-ons have changed.

The challenge was testing. Default talos runs don’t include any add-ons (or maybe one or two) and I needed a few different profiles to see how things behaved in different situations. It was possible that startups with no add-ons would be affected quite differently to startups with many add-ons. So I had to figure out how to add extensions to the default talos profiles for my try runs and fired off try runs for the cases where there were no add-ons, 200 unpacked add-ons with a bunch of files and 200 packed add-ons. I then ran all those a second time with deleting extensions.json between each run to force the database to be loaded and rebuilt. So six different talos runs for the code without my changes and then another six with my changes and I triggered ten runs per test and went to bed.

The first thing I did this morning was check the performance results. The first ready was with 200 packed add-ons in the profile, should be a good check of the file scanning. How did it do? Amazing! Incredible! A greater than 50% performance improvement across the board! That’s astonishing! No really that’s pretty astonishing. It would have to mean the add-ons manager takes up at least 50% of the browser startup time and I’m pretty sure it doesn’t. Oh right I’m accidentally comparing to the test run with 200 packed add-ons and a database reset with my async code. Well I’d expect that to be slower.

Ok, let’s get it right. How did it really do? Abysmally! Like incredibly badly. Across the board in every test run startup is significantly slower with the asynchronous I/O than without. With no add-ons in the profile the new code incurs a 20% performance hit. In the case with 200 unpacked add-ons? An almost 1000% hit!

What happened?

Ok so that wasn’t the best result but at least it will stop bugging me now. I figure there are two things going on here. The first is that OS.File might look like you can fire off I/O operations in parallel but in fact you can’t. Every call you make goes into a queue and the background worker thread doesn’t start on one operation until the previous has completed. So while the I/O operations themselves might take about the same time you have the added overhead of passing messages between the background thread and promises. I probably should have checked that before I started! Oh, and promises. Task.jsm and OS.File make heavy use of promises and I have to say I’m sold on using them for async code. But. Everytime you wait for a promise you have to wait at least one tick of the event loop longer than you would with a simple callback. That’s great if you want responsive UI but during startup every event loop tick costs time since other code might be running that you don’t care about.

I still wonder if we could get more threads for OS.File whether it would speed things up but that’s beyond where I want to play with things for now so I guess this is where this fun experiment ends. Although now I have a bunch of code converted I wonder if I can create some replacements for OS.File and Task.jsm that behave synchronously during startup and asynchronously at runtime, then we get the best of both worlds … where did that bottle of wine go?

Zooming along, hopefully as fast as before

I’ve just landed a fix to a bug that has irritated me ever since page zoom started getting remembered for sites. It fixes a real problem you find if you both use zoom a fair bit, and load pages in background tabs. When you finally decide to look at that tab there is this little pause (or long pause if the page is large) and sometimes a visual jump as it re-zooms the content. It also changes where the page is scrolled to which is very irritating if you have just clicked a link to a specific line in some source code for example.

The fix was relatively simple, the problem is it causes a little extra time to be spent loading background tabs. About 3ms from the numbers I have. Normally this is small enough that you wouldn’t notice in comparison to how long a page takes to load. One concern though is how this impacts session restore where you have a lot of background tabs all trying to load at once. The tests I’ve run say pretty much what I’d expect, Firefox opens, all your previous tabs are displayed and start loading with no change due to this fix, the reason being that the additional time is spent later in the load cycle of the tabs.

Of course tests never quite mirror reality so I’d like all you nightly testers to keep an eye out to see if you notice a sudden change in the performance of the browser when you open it and session restore has a lot of tabs to restore. Obviously before filing a bug do the good thing and try it a few times both with nightlies with the fix (tomorrow’s should be the first) and nightlies without. If you do see definite issues then file a bug and CC me. Assuming there are no issues then I’ll be trying to get approval to land this for Firefox 3.1 in the near future.

How extensions can slow down Firefox (my dirty little secret)

Tab Sidebar is probably my favourite extension that I’ve created. It is certainly the most polished, thanks mostly to other people pushing me to make it so. For those that haven’t used it it creates a thumbnail preview of all of your tabs in the sidebar. The thumbnails automatically update whenever the page changes, even things like popup menus generally show up. This automatic updating comes at a cost though.

In order to detect changes to the page content the extension mostly uses DOM mutation events. These are in theory sent out whenever a change to the page’s structure happens, which covers adding/removing content, style changes, text entry and all manner of things. What I wasn’t aware of when I first used this technique was that Gecko performs certain optimisations when there are no DOM mutation listeners registered for a document (the normal case). Simply the presence of a mutation listener impacts the speed of any DOM manipulation by the page. How much? Well nothing like a graph to illustrate things.

Performance impact of mutation listeners

Performance impact of mutation listeners

These are numbers gathered from the Dromaeo test suite, averaging 5 runs with and without Tab Sidebar installed. A little perf loss is inevitable as the extension must perform some checks on the mutation and occasionally repaint the thumbnail, but nowhere near the sort of regression that actually occurs. I’ve combined some of the test results for simplicity but it is pretty clear that while the regular JS tests are essentially unaffected, the DOM tests can have wildly different results. Don’t ask me why DOM Queries get faster with the extension installed, but DOM modifications are a nice round 4 times slower.

That isn’t to say that using mutation listeners is a complete sin to be avoided at all costs. In some cases they are the only safe option. I discovered this problem some time ago and never before found a better way to detect the changes for Tab Sidebar. Last I checked Firebug uses them in similar ways. The point I think is that there are many features in the platform that can have unexpected side effects.

Of course I guess I wouldn’t be making this post if I hadn’t finally come up with a solution. While I’m no longer actively working on my extensions, I’m afraid I couldn’t resist the opportunity of the new MozAfterPaint event that landed on trunk recently. Quite simply whenever an area of the page is repainted on the screen this event is dispatched with details of what the area painted was. This is absolutely perfect for Tab Sidebar, which isn’t surprising since the use case it was written for is essentially what Tab Sidebar is doing anyway. After a few hours of ripping out most of the event handling code I now have a working prototype that redraws solely based on this event, only painting the areas that might have changed.

Performance impact of the MozAfterPaint listener

Performance impact of the MozAfterPaint listener

Boy does it work well. As you can see from the graph essentially all of the performance loss is gone. For some reason the faster DOM queries are still there (who am I to complain if my extension makes Firefox run faster!). And what’s more the previews are now updated not just for DOM changes, on OSX at least it even sees repaints of plugins. It is quite bizarre to be watching a movie on youtube and see the little thumbnail at the side showing the movie as well.

I guess now I have whetted your appetite it would be unfair not to make the test version available. Use at your own risk of course, the sheer amount of code change means there are likely some problems lurking, but you can install Tab Sidebar 2.5a1 and see how it goes. If you set the preference extensions.tabsidebar.selecteddelay to -1 then it won’t even delay before repainting and just repaint the thumbnail as soon as the main page is painted, this actually seems to work out best for me, particularly on sites with fast animations.