Client-side caching in content management systems | Matslats

By matslats, 22 March, 2008

Background

Users in developing countries are often extremely short of bandwidth. They don't want their browsers requesting pages that should already be cached. The HTTP protocol has excellent support for pages to declare expiry and last-modified dates, and for the server and client to decide whether to send a file accross the network or use one already cached. HTTP headers are even more invisible though, than the HTML head, are are not even seen by web designers. Consequently with the advent of higher and higher level web site management tools, and improved networks, configuring these headers appropriately has become a non-priority. Furthermore, as the web page becomes decoupled from an item of content, and more of a container for a dynamic selection of content, the issues around deciding page expiry, and what consititutes a modification, have become more complex. This paper attempts to communicate to cms developers how these issues might be tackled from the perspective of a CMS developer, so that users in developing countries need not wait for pages which should be cached. There are many other ways to improve the experience of this sector of users, and there is a comprehensive study on web design guidelines by aptivate. This paper is not about server-side caching, employed by content management systems to reduce the burden on their processors, by saving pages or elements of pages, rather than rebuiding them on every request.

Key technical points

There are two HTTP headers which the server could add to the page. These are:

Expires:
Last-Modified:

Here is the conversation the server and browser could have before requesting a page:

Client: Has the cached page expired? If yes, or if the header isn't there..
Client: Asks the server to send the page only If-Modified-Since the Last-Modified [time]
Server: Sends page or sends headers back to say it was not modified since [time]

Either of these headers, then, can reduce the number of page requests, but both is better. With neither caching may or may not happen on intermediate proxies, for better or for worse

The problem

For the cms designer, the solution is easy, insert the headers. The problem is trickier though. When content is dynamic, how do you decide what constitutes last modified, and when a page might expire? Passing those choices to the editor or even to the content consumer will complicate things. But making opaque wrong decisions will baffle and annoy. E.g. A user might post a comment to a page and it might not appear because the main content of the page is unmodified. A site with a vibrant feed on every page might never be served from cache, even though the user doesn't want up-to-the minute information. So who should decide when it expires?

Some solutions

The one reccommendation which applies to all HTTP requests is that there should be a 'Last-Modified:' time, relating to the content of the page, not the moment the page was generated and served. There need be no more discussion about this point. The second issue, harder, is who sets the 'Expires:' time, because 'current' pages are displayed instantly, without reference to the server. Setting a header is a very minor feature, and not all web sites will want to implement it. For most CMSs it is possible to write modules which manipulate all aspects of the system. Control of the expiry date might be taken on any one three levels, each level would probably require a different module.

The administrative level

The decision is regarded as high level, and is taken by one person across the whole site. The site designers may put static and dynamic content on different pages to help simplify the caching question. In the simplest case, the headers would refer to the main content item on a page, but a full analysis of all the content could also be done. Systems usually have 'content types', and each content type might have rules for expiry and last modified dates. For example a content type, 'blogroll' might be good for one day after the last blog was posted, but the individual blog items might last for 2 years after they were created or served - unless the blog items carry comments. The danger of this approach is that it is very broad, and doesn't relate closely enough to the actual content, especially on a complex site.

The editor level

The editor can give individual attention to content items. Some content types may contain options for the editor, others may not need any options. For example, wiki pages are constantly modified and so cannot have an expiry date, but a 'static' content item could be set to expire in three months after editing, or three months after it was served. The term 'page expires' should need only a little explanation for an editor and perhaps a drop down box.

The user level

An unusual solution, because most users don't pay attention to technical questions, but worth considering at this level. A session or user setting could allow the user to choose what expiry dates are put on pages. Here is a possible user story:

clicks on 'speed up this site'.
Reads: if you use this site often, you can tell your computer use old versions of pages instead of fetching them from the internet every time. Remembered pages arrive immediately but your computer will not know if they have changed.
chooses 'remember each page for one month'
The pages are now built for this user / session with an expiry date one month after serving.

Conclusion

Comments

Your name

The content of this field is kept private and will not be shown publicly.

Homepage

Comment

About text formats

Text format

Plain text

No HTML tags allowed.
Web page addresses and email addresses turn into links automatically.
Lines and paragraphs break automatically.

Plain text

No HTML tags allowed.
Lines and paragraphs break automatically.
Web page addresses and email addresses turn into links automatically.

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.