WordPress + Memcached

One of the most bizarre critiques of WordPress that I often hear is “it doesn’t come with caching” – which makes no sense because Cache is one of the best features of WordPress out of the box. That’s kind of like saying: “my iPod sucks because it doesn’t have any songs in it” – when you first buy it. Your iPod can’t predict the future and come pre-loaded with songs you love, and your WordPress environment can’t come already-installed without knowing a minimal number of things. You have to pick a username / password, you have to point at a database, and if you want to cache, you have to pick how you want to cache (you don’t HAVE to cache – but really, you HAVE to cache).

If you just have a blog that gets “light” traffic, and that blog only sits on one server: install W3 Total Cache (or WP Super Cache) and you can skip the rest of this post.

For W3TC, make sure it is not in preview mode and follow its instructions / guidance for how to get started making your site faster. W3TC can be used for a multi-server setup as well, but if you are running a website that is in a load-balanced environment, you are probably going to be more prone to use the tools I am about to show you.

Memcached

Memcached is a distributed in-memory caching system that stores key-value pairs in RAM. That will make sense to a lot of people, if it doesn’t to you – check this out:

// hey - connect me to the server where I store stuff!
connection = connect_to_server();

// store this value in that RAM of yours
// "key" holds "value"
connection.set( key, value );

// give me that value I stored with "key"
connection.get( key );

It was originally developed by Brad Fitzpatrick for LiveJournal. The concept is simple: instead of running 18 web servers / each reading from their own limited cache, each server reads / writes from a common distributed set of caches that they all can share. As long as each server is aware of the same set of “Memcached boxes,” what one server stored in the cache can be read by every other server in the cluster of web servers. The advantage to doing this is immediately apparent: 18 servers reading from the same 30GB of collective cache is more efficient than each server reading from its own smaller local cache.

Memcached (pronounced: Mem-cash-dee), or Memcache-daemon, is a process that listens by default on port 11211. Like httpd (H-T-T-P-daemon), it runs in the background, often started automatically on server load.  A lot of huge websites use Memcached – at least: Facebook, YouTube, and Twitter.

Memcached has external dependencies, so it is best to install it using a package manager like MacPorts (on OS X) or yum (Linux). Package managers give you the ability to type commands like port install memcached and port load memcached to do almost all of the initial setup for Memcached.

Memcached does not have to run on external servers. If your production site uses external servers, and you want to develop locally with Memcached as a local cache, that is just fine.

We’ll come back and discuss more tools for interacting with Memcached once we know a little more about it.

Since WordPress is written in PHP and makes a bunch of expensive SQL queries to its companion MySQL database, we need to figure out how to start taking advantage of Memcache in PHP to store the results of expensive queries, functions, and external HTTP requests in cache.

Memcache PHP Extension

Now that you (maybe) have Memcached installed, you need a way for PHP to talk to it, because it can’t out of the box. You have at least 2 choices for installing PHP extensions:

  1. Package manager
  2. PECL

If you have PECL / PEAR installed, you can run commands like pecl install memcache. Using MacPorts, it will be something more like port install php5-memcache. So what does this get you?

Mainly, access to the Memcache class (after you restart Apache) provided by the PHP extension and a procedural API for add / update / delete / retrieve-ing items from your Memcache server(s). Some examples:

<?php
/**
 * Object-oriented approach
 *
 */
$memcache = new Memcache;
// connect locally on the default port
$memcache->connect( '127.0.0.1', 11211 );

// key = set this item in the cache
// value = a string in this case, also supports objects and arrays
// false = do NOT use zlib compression to store the value
// 30 = the number of seconds to cache the item
$memcache->set( 'some_key', 'this is the value I am storing', false, 30 );

// replace / update the value of an item
$memcache->replace( 'some_key', 'but this is the value I am updating', false, 90 );

// retrieve an item
$memcache->get( 'some_key' );

/**
 * Procedural approach (functions not classes)
 *
 */
$memcache = memcache_connect( '127.0.0.1', 11211 );
memcache_add( $memcache, 'some_key', 'this is the value I am storing', false, 30 );
memcache_set( $memcache, 'some_key', 'this is the value I am storing', false, 30 );

// replace / update the value of an item
memcache_replace( $memcache, 'some_key', 'but this is the value I am updating', false, 90 );

// retrieve an item
memcache_get( $memcache, 'some_key' );

These functions work just fine, but when using WordPress, we aren’t going to call them directly – these functions will be “abstracted.”

WP Object Cache

WordPress has a cache abstraction layer built-in. What this means is that WordPress has an API for interacting with what is called the WP Object Cache – a PHP class: WP_Object_Cache. The WP Object Cache class is “pluggable” – meaning, if *you* define a class called WP_Object_Cache in your codebase and include it at runtime, that’s the class WordPress will use, otherwise it defaults to WP’s. WordPress also has a procedural API for interacting with the cache object (likely what you’ll use in your code), here’s a peek into future code you may write:

<?php

//
// Function (or "template tag") to abstract the fetching of data.
// Data comes from cache, or call is made to db then cached.
// wp_cache_set( key, value, cache_group, expires (n seconds) )
//

function get_expensive_result() {
    global $wpdb;

    $key = 'expensive_query';
    $group = 'special-group';
    $response = wp_cache_get( $key, $group );

    if ( empty( $response ) ) {
        // add the result of an expensive query
        // to the cache for 3 hours
        $response = $wpdb->get_results(.....);
        if ( !empty( $response ) )
            wp_cache_set( $key, $response, $group, 60 * 60 * 3 );
    }
    return $response;
}

$data = get_expensive_result();

WP Object Cache is agnostic as to how it is implemented. Meaning, it does either a) whatever the default behavior is in WP, which is to “cache” non-persistently or b) do whatever your WP Object Cache (your “pluggable” override) class says to do.

Here’s the catch – it won’t do anything until you add this to wp-config.php:

define( 'WP_CACHE', true );

Here’s the other catch – all that will do is try to load wp-content/advanced-cache.php and wp-content/object-cache.php which we haven’t installed yet. Those files are typically made available by a persistent cache plugin (we want our cached items to be available across page loads until they expire – duh), but let’s pause and come back to this subject a little later.

Transients API

The Transients API in WordPress is an abstraction of an abstraction. Memcached should always be viewed as a “transient cache,” meaning that sometimes you will request data, and it won’t be there and will need to be re-cached. The Transients API has a simple procedural API to act on items. It is actually the exact same thing as wp_cache_*, and even allows you to pass in expiration, it just doesn’t allow you to specify a group (‘transient’ is the group). It is really an exercise in semantics. An advantage: if you aren’t using a WP Object Cache backend (a persistent cache plugin), your “transients” will get stored in the wp_options table of your site’s WP database.

<?php
// Transients procedural API
set_transient( 'woo', 'woo' );
update_transient( 'woo', 'hoo' );
get_transient( 'woo' );
delete_transient( 'woo' );

// multisite
get_site_transient( 'multi-woo' );

// behind the scenes, this is happening
wp_cache_set( 'woo', 'woo', 'transient' );
wp_cache_set( 'woo', 'hoo', 'transient' );
wp_cache_get( 'woo', 'transient' );
wp_cache_delete( 'woo', 'transient' );

I typically use Transients when I want to store something for an extremely short period of time and then manually delete the item from the cache. Here’s an example:

<?php
function this_takes_90_secs_to_run_in_background() {
    if ( !get_transient( 'locked' ) ) {
        // this function could get called 200 times
        // if we don't set a busy indicator
        // so it doesn't get called on every page request
        set_transient( 'locked', 1 );

        // do stuff....

        // ok I'm done!
        delete_transient( 'locked' );
    }
}

add_action( 'init', 'this_takes_90_secs_to_run_in_background' );

Could this exact same thing have been accomplished by using wp_cache_* functions? You betcha, because they were (behind the scenes)! Like I said, I use Transients in specific instances – it’s really an issue of semantics.

So our next step is to make WordPress ready to start caching.

Memcached plugin / WP Object Cache backend

Wouldn’t you know it? There’s a plugin that implements WP Object Cache / Memcache from one of the lead developers of WordPress, Ryan Boren. It’s not really a “plugin” – it’s a drop-in file. Once you download the plugin, move object-cache.php into the wp-content directory, and you have pluggably-overriden WP Object Cache. If you are brave and actually look at the code in the lone file, you will notice a bunch of functions (a procedural API) for interacting with the WP Object Cache object (an instance of the Memcache class).

You will also notice that at it’s core, it is just calling the methods of the Memcache class, but it has abstracted these steps for you. It works with Multisite and has intelligently cooked up a way to cache by blog_id, group, and key. You get caching by blog_id for free, there is nothing you need to do for this to work.

Caching by group is really “namespacing” your keys. Let’s say you want to call 10 different things in your cache “query” – your choices for providing context are a) using a longer key like “bad-query” or b) just using “query” and adding the group “bad” into the fold:

<?php
wp_cache_set( 'bad-query', 'something' );
wp_cache_set( 'query', 'something', 'bad' );

This is where cache and transients completely overlap. Since you can’t specify a group with transients, “bad-query” would be your only option for the key to avoid name collision. Like I said, it is all semantics.

So we know we can call wp_cache_delete( $key, $group ) to delete an item from the cache, right? So that must be why groups are helpful as well, to remove a bunch of things from the cache at once? Nope. You can’t remove a group of items from the cache out of the box. Luckily, this infuriated me enough that I wrote a plugin that will do this for you. More in a bit, we still need to obtain an advanced-cache.php file.

Batcache

The advanced-cache.php piece of the puzzle is implemented by Batcache. Batcache is a full-page caching mechanism and was written by Andy Skelton from Automattic. WordPress.com used to host Gizomodo’s live blogs, because it would go down on its own architecture during Apple keynotes after ridiculous spikes in traffic. The idea is this: if a page on your site is requested x number of times in an elapsed period of time, cache the entire HTML output of the page and serve it straight from Memcached for a a defined amount of time. Batcache is a bonafide plugin, but the advanced-cache.php portion is a drop-in that should be moved to the wp-content directory.

Batcache will serve these fully-cached pages to anonymous users and send their browser a Cache-Control: max-age={MAX_AGE_OF_PAGE}, must-revalidate; HTTP header. There are some dangers you should be aware of when serving highly-cached pages from a dynamic website.

Batcache has a class member array called $unique which you should populate with values that would ordinarily make your page display different content for different users: Regionalization, perhaps Browser alias, country of origin of User, etc. If you are getting fancy and adding classes to the <body> tag based on browser, and you don’t add those classes to the $unique array in Batcache, you may end up serving HTML pages with an “ie ie7” class in the class attribute of the <body> tag to most of your users, regardless of what browser they are actually using. Fail.

Batcache serves these pages to “anonymous” users only. How does it know they are anonymous? It looks in the $_COOKIE super-global for some WordPress-specific cookies.

// Never batcache when cookies indicate a cache-exempt visitor.
if ( is_array( $_COOKIE) && ! empty( $_COOKIE ) ) {
    foreach ( array_keys( $_COOKIE ) as $batcache->cookie ) {
        if ( $batcache->cookie != 'wordpress_test_cookie' &&
        (
            substr( $batcache->cookie, 0, 2 ) == 'wp' ||
            substr( $batcache->cookie, 0, 9 ) == 'wordpress' ||
            substr( $batcache->cookie, 0, 14 ) == 'comment_author'
        )
    ) {
            return;
        }
    }
}

That’s all well and good unless you have implemented a custom authentication system like we have at eMusic.

The advanced-cache.php file that ships with Batcache is really a starting point. I know it may seem daunting or too complicated to dig into someone else’s plugin code, but if you are using Batcache, your site begs another level of skill and coding.

Using Memcached at eMusic

At eMusic, we use 4 dedicated Memcached servers in production – combined equaling ~28GB of RAM. When you have THAT much memory to interact with, some interesting things can happen. Here are few:

  • Your keys with no expiration will seemingly never expire until the cache starts evicting LRU (Least Recently Used) keys. Lesson learned here… always indicate expiration!
  • If you didn’t divide your keys up into a bunch of smaller groups, try to flush one portion of the cache will end up flushing tons of data that doesn’t need to be refreshed and might send a blast of traffic to your database cluster or web service tier. Lesson learned here… use MANY cache groups.
  • If you aren’t updating or deleting cache keys in your code at all, you may find that you end up with stale data often. Especially if you work with an editorial team / writers. They’ll regularly come to you with a “hey, my data is not updated on the live site!”
  • Don’t assume WordPress is getting it right when it comes to caching its own data, dig in find out how it really works. You may (will) find some objectionable things.

Johnny Cache

Remember how I said that you can’t “flush” the cache by cache group? Turns out that’s a big problem for us. Why? If we roll code or add / change a feature, we sometimes want to clear a certain cache to reflect a certain change.

Cache groups are a WordPress concept, not a Memcached concept. WordPress adds the group names to keys by convention, so if you know how to parse and create keys like the Memcached WP Object Cache backend, you can sort through the keys on your Memcached servers and group them together yourself to inspect them in the admin. If you have a list of keys for a group, you can loop through them and call wp_cache_delete( $key, $group ) on each.

Here’s some Memcache extension code to retrieve keys:

<?php
$memcache = new Memcache();
$memcache->connect( $server, '11211' );
$list = array();
$allSlabs = $memcache->getExtendedStats( 'slabs' );
$items = $memcache->getExtendedStats( 'items' );
foreach ( $allSlabs as $server => $slabs ) {
    foreach( $slabs as $slabId => $slabMeta ) {
        $cdump = $memcache->getExtendedStats( 'cachedump', (int) $slabId );
        foreach( $cdump as $keys => $arrVal ) {
            if ( !is_array( $arrVal ) ) continue;
            foreach( $arrVal as $k => $v ) {
                $list[] = $k;
            }
        }
    }
}

The Memcached backend creates keys like so:

// blog_id:group:key
1:catalog:artist-info-246809809

To parse the list of keys and sort them into groups, try this:

<?php
$keymaps = array();
foreach ( $list as $item ) {
    $parts = explode( ':', $item );
    if ( is_numeric( $parts[0] ) ) {
	$blog_id = array_shift( $parts );
	$group = array_shift( $parts );
    } else {
	$group = array_shift( $parts );
	$blog_id = 0;
    }

    if ( count( $parts ) > 1 ) {
	$key = join( ':', $parts );
    } else {
	$key = $parts[0];
    }
    $group_key = $blog_id . $group;
    if ( isset( $keymaps[$group_key] ) ) {
        $keymaps[$group_key][2][] = $key;
    } else {
	$keymaps[$group_key] = array( $blog_id, $group, array( $key ) );
    }
}

ksort( $keymaps );
foreach ( $keymaps as $group => $values ) {
    list( $blog_id, $group, $keys ) = $values;
    foreach ( $keys as $key ) {
        .........
    }
}

Johnny Cache is the front-end I wrote for the WP Admin to do this. It allows you to select one Memcached server at a time. Once selected, the servers keys (not values) are retrieved then parsed and ordered by group and blog. The plugin allows you to do the following:

  • Flush keys by cache group
  • Remove single items from the cache
  • View the contents of single items in the cache
  • Flush the cache for a specific user by User ID – helpful if you made a change by hand in the database for a specific user

Johnny Cache is still a work in progress, but flushing cache by group was important enough that I wrote my own code to handle it.

Minify-cation

Working in a load-balanced environment is a lot different than working on one server. This is no clearer than when working with tools that expect for you to dynamically create static files on the server and then point a URL directly at them. Almost all of the time, this doesn’t work. Also, if the generation of the static files is expensive, this is a task best performed once, cached, and then have the results shared with every other server.

I wrote a plugin called Minify that magically grabs all of your page’s scripts and styles, hashes the src names to create an identifier, and then combines the styles or scripts into one file and runs them through some minification classes.

I get around having to serve flat files by creating a rewrite which points to a “make” file. The make file either reads the result of this smashing-together-of-sources, or does the smashing and then saves / serves the result. The advantage here is that every request for our JavaScript and CSS comes straight out of Memcached.

Gotcha alert! To cache-bust a CDN like Akamai, you need to change the file name every time you roll new code to make sure your users aren’t getting served cached files. Query strings suck for that, so I made an increment part of the rewrite ( get_site_option( 'minify:incr' ) ). Our paths end up looking like:

http://www.emusic.com/wp-content/cache/minify-bdda2ca041434058e578f7b84eb7482b-23875598.css

// here's how it is translated
http://{HOST}/wp-content/cache/minify-{HASH}-{INCR}.{EXTENSION}

Works for both JavaScript and CSS. It’s magic.

Sitemaps

Same concept. If I want to serve google-news-sitemap.xml, I make a rewrite that points to google-news-sitemap.php. If it is in the cache, serve it, otherwise build it, store it, then serve it.

Memcached on the command line

One final note on interacting with Memcached servers: you can telnet to a server and use the command line tools bundled with it.

$ telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get foo
VALUE foo 0 2
hi
END
stats
STAT pid 8861

More here. Also take a look at Memcache-top.

Conclusion

There is no “definitive” way to use cache, so experiment and think through your caching strategy. A finely-tuned cache will drastically improve your website’s resilience and performance.

9 thoughts on “WordPress + Memcached

  1. Thank you for the article. I am finally starting to get a basic understanding of what you are talking about.

    Any chance of releasing a copy of your custom ‘Johnny Cache’ and ‘minify’ scripts to take a peek at?

  2. Great post thanks Scott! It really helped illustrate how everything fit’s together in practice.

    Have you released Minify? And do you have any plans to release Cloud?
    Thanks!

  3. Great write up Scotty and also really like Johnny Cache. We forked the JC plugin (just a bit) to use it with our Multisite installation so you can view and clear Memcached object groups on a per site basis if something gets stuck in there. It’s also an easy tool for site admins to use when troubleshooting to clear memcached per site in a MS setup. If you’re interested in checking out the fork let me know.

Comments are closed.