By way of my favourite Bulgarian / Canadian / American / Web Ninja Stoyan Stefanov, and Yahoo!’s Exceptional Performance Team I’ve been studying the fine work found in their best practises guide for speeding up websites. As a recluse who prefers hiding behind servers rather than dancing around your web browser’s canvas, I was intrigued with their server side recommendations – however sparse they may be. In particular, flushing generated head
content early to speed up overall page delivery and rending time was a technique new to me.
Flushing Chunks
HTTP 1.1 introduced a new mechanism for delivering hypertext that offered greater flexibility with the sending of a response body. Previous incarnations of HTTP only allowed for the demarcation of a complete response body by either specifying the content length or closing the server connection. HTTP 1.1 introduced a mechanism to transfer bodies better suited to dynamically generated content. Chunked transfer encoding allows for the demarcation of individual blocks of a complete message, which can therefore be delivered one piece at a time without knowledge of the complete body size in advance. This provides us with an opportunity to control the delivery of an HTML document piece by piece.
With the generation of dynamically generated content on the fly, we are introducing bottlenecks for content delivery that do not exist with static content. A connection to and querying of a database, complex computations, parsing of user input, etc. slow the server’s ability to deliver content. While the client’s browser is waiting on the server to finish (the next chunk of) content generation, it’s twiddling its thumbs unless we give it something to do.
Should we be able to push out the more easily generated content before the heavy lifting begins, we can occupy the client while it waits for the rest of the body. If that point of departure is the HTML head, the browser can busy itself with interpreting that while the body is being prepared. The HTML head typically contains various includes, such as CSS and Javascript. While the HTML body is being prepared, the browser can busy itself pulling in such assets.
PHP’s flush()
function provides us with a tool to manipulate HTTP body delivery. Flush()
“tries to push all the output [generated] so far to the user’s browser”. Depending on a web server’s capabilities and configuration flushing in PHP before body generation will result in the web server packaging up a chunk for delivery and pushing it out to the client.
Scenario
In order to play with this idea, a dummy is required for knocking around. The example scenario here is inspired: I want to display pretty colours palettes, but not just any – COLOURlovers top palettes pulled from their API. Since we are looking at the advantages of pushing out the HTML head early, the example requires a few properties:
- The HTML head should include a few external files (CSS, Javascript) with some weight so we can look at how the browser behaves pulling these assets in.
- The HTML body should have some (variable but controllable) processing heft so we can look at the relationship of server side document building bottlenecks and overall page delivery.
This is what we have, and this is what it looks like under the hood:
Top 20 Colour Palettes
-
palette as $palette) { ?>
- title, ENT_COMPAT, 'UTF-8') ?>
-
-
colors->hex as $colour) { ?>
- #
Besides the Javascript used to benchmark page load time (more on that below), the example:
- Pulls in an external style sheet.
- Pulls in the un-minified jQuery library (which has some good heft).
- Pumps out a list of the colour palettes – from a cached source to avoid benchmarking irregularities associated with API access.
- When done loading, a benchmarks is performed and jQuery is put to some good use.
For my testing, and working, environment I’m using Apache 2.2 with mod_php. Nothing fancy about my web server install and configuration. Mileage may vary with other web servers, and web server configurations.
The Benchmarks
To get an understanding of overall page delivery, there are two pieces of data that I collected:
- Response time from the server. Specifically, how long did it take the client to start receiving the requested page. This does not mean that the complete HTML document has been received by the client.
- Time to complete page load. How long did it take for all the page and page assets to be loaded.
To get the response time from initial request to initial receive, I used Firebug 1.2 for Firefox 3. Firebug’s network monitoring tool apparently does not record the complete request fulfilment time, but only from the point of request to the initial point of reception.
Once the page has been initially received, we record the client time ( x = new Date();
). The onload event fires at the end of the document loading process allowing us to calculate the time to load the rest of the page including assets.
To figure out how much time it took for the complete page and assets to be loaded we can sum the response time from the server and our onload time (Response + Onload).
I did this benchmark several times taking averages, and using various configurations. Here is what I found:
Not To Flush, Or To Flush
Running the above scenario with no modifications (IE: no flush()
ing) give us our baseline world view:
By adding a sleep()
call for 1 and then 2 seconds right after the HTML head will allow us to artificially inflate the processing time for creating the body.
By doing this, we can compare response and onload time in more processing heavy scenarios:
Benchmarks here show that onload time is roughly fixed regardless of the processing time to create and initially deliver the requested page. Flipping the request and onload times in the bar graph show this more clearly:
Onto the meat. Now let’s flush right before body generation like this:
And the benchmarks look like this:
Right out of the gate, we see little change of our base page total delivery time. However, with the flush()
we are able to control and fix the initial response time from the server. As we grow the time required for the server to generate the body, we see some considerable savings. Specifically, a fixed saving of approximately 1 second.
By flushing early we are able to run a portion of server page processing in parallel with client processing, where previously client processing ran following server processing. This is a considerable efficiency in overall client and server page processing.
Caveats
While the general benefits of flushing content early, mileage may vary.
- Tipping Point - Determining where to flush will depend on where your page transitions from simple data output to expensive processing and the client side “value” of the content considered for flushing early. If your head is expensive to generate, or contains no meat for the browser to work through, the benefits are lost.
- Content Compression - Compressing content before delivery is another important performance strategy. However, server side compression may not mix with flushing content. mod_gzip for Apache 1.x does not support chunking, requiring chunked content to be dechunked before it can be compressed. mod_deflate for Apache 2+ does not suffer from such limitations.
- Pooling - Like mod_gzip, any layer between the generated content and the client that pools data will undermine the value of early flushing. PHP output buffering and page caching (including reverse proxies) are examples of practices which may interfere with flushing.
- Nested Templates - A common approach to data templating is to wrap a body template in a site template, for example Zend_Layout. While this approach has several advantages for maintaining consistent layout, it will undermine most if not all of the value around flushing early.s