Performance is a Problem

The (failing) New York Times has many apps and services. Some live on physical servers, others reside in The Cloud™. I manage an app that runs on multiple containers orchestrated by Kubernetes in (A)mazon (W)eb (S)ervices. Many of our apps are migrating to (G)oogle (C)loud (P)latform. Containers are great, and Kubernetes is (mostly) great. “Scaling” usually happens horizontally: you add more replicas of your app to match your traffic/workload. Kubernetes does the scaling and rolling updates for you.

Most apps at the Times are adopting the 12 Factor App methodology – if you haven’t been exposed, give this a read. tl;dr containers, resources, AWS/GCP, run identical versions of your app everywhere. For the app I manage, the network of internationalized sites including Español, I use a container for nginx and another for php-fpm – you can grab it here. I use RDS for the database, and Elasticache for the Memcached service. All seems good. We’re set for world domination.

Enter: your application’s codebase.

If your app doesn’t talk to the internet or a database, it will probably run lightning fast. This is not a guarantee, of course, but network latency will not be an issue. Your problem would be mathematical. Once you start asking the internet for data, or connect to a resource to query data, you need a game plan.

As you may have guessed, the network of sites I alluded to above is powered by WordPress™. WordPress, on its best day, is a large amount of PHP 5.2 code that hints at caching and queries a database whose schema I would generously call Less Than Ideal. WP does not promise you fast queries, but it does cache data in a non-persistent fashion out of the box. What does this mean? It sets a global variable called $wp_object_cache and stores your cache lookups for the length of the request. For the most part, WP will attempt to read from the cache before making expensive database queries. Because it would be nice to make this cache persistent (work across multiple requests/servers), WP will look for an object cache drop-in: wp-content/object-cache.php, and if it is there, will use your implementation instead of its own.

Before we go any further, let me say this: it is basically mandatory that you use persistent caching.

Keep in mind: you haven’t written any code yet – you have only installed WP. Many of us do this and think we are magically done. We are not. If you are a simple blog that gets very little traffic: you are probably ok to move on and let your hosting company do all of the work.

If you are a professional web developer/software engineer/devops ninja, you need to be aware of things – probably before you start designing your app.

Network requests are slow

When it comes to dealing with the internals of WordPress in regards to database queries, there is not a lot you can do to speed things up. The queries are already written in a lowest common denominator way against a schema which is unforgiving. WP laces caching where it can, and simple object retrieval works as expected. WordPress does not cache queries, what it does is limit complex queries to only return IDs (primary keys), so that single item lookups can be triggered and populate the cache. Here’s a crude example:

$ids = $wpdb->get_results(...INSERT SLOW QUERY...);
_prime_post_caches( $ids, true, true );
// recursively get each item
$posts = array_map('get_post', $ids);

get_post() will get the item from the cache and hydrate it, or go to the database, populate the cache, and hydrate the result. This is the default behavior of WP_Query internals: expensive query, followed by recursive lookups from the cache after it has been primed. Showing 20 posts on the homepage? Uncached, that’s at least 2 database queries. Each post has a featured image? More queries. A different author? More queries. You can see where this is going. An uncached request is capable of querying everything in sight. Like I said, it is basically mandatory that you use an object cache. You will still have to make the query for items, but each individual lookup could then be read from the cache.

You are not completely out of the woods when you read from the cache. How many individual calls to the cache are you making? Before an item can be populated in $wp_object_cache non-persistently, it must be requested from Memcached. Memcached has a mechanism for requesting items in batch, but WordPress does not use it. Cache drop-ins could implement it, but the WP internals do not have a batch mechanism implemented to collect data and make one request in batch for all items.

We know that we could speed this all up by also caching the query, but then invalidation becomes a problem. YOU would have to implement a query cache, YOU would have to write the unit tests, YOU would have to write the invalidation logic. Chances are that YOUR code will have hard-to-track-down bugs, and introduce more complexity than is reasonable.

WP_Query might be doing the best it can, for what it is and what codebase it sits in. It does do one thing that is essential: it primes the cache. Whether the cache is persistent or not, get_post() uses the Cache API internally (via WP_Post::get_instance()) to reduce the work of future lookups.

Prime the Cache

Assuming you are writing your own code to query the database, there are 3 rules of thumb:

1) make as few queries as is possible
2) tune your queries so that they are as fast as possible
3) cache the results

Once again, cache and invalidation can be hard, so you will need unit tests for your custom code (more on this in a bit).

This is all well and good for the database, but what about getting data from a web service, another website, or a “microservice,” as it were. We already know that network latency can pile up by making too many requests to the database throughout the lifecycle of a request, but quering the database is usually “pretty fast” – at least not as slow as making requests over HTTP.

HTTP requests are as fast as the network latency to hit a server and how slow that server is to respond with something useful. HTTP requests are somewhat of a black box and can time out. HTTP requests cannot be made serially – meaning, you can’t make a request, wait, make a request, wait, make a request…

HTTP requests must be batched

If I need data from 3 different web services that don’t rely on each other’s responses, you need to request them asynchronously and simultaneously. WordPress doesn’t officially offer this functionality unless you reach down into Requests and write your own userland wrapper and cache. I highly suggest installing Guzzle via Composer and including it in your project where needed.

Here is some example Guzzle code that deals with Concurrency:

use GuzzleHttp\Client;
use GuzzleHttp\Promise;

$client = new Client(['base_uri' => 'http://httpbin.org/']);

// Initiate each request but do not block
$promises = [
  'image' => $client->getAsync('/image'),
  'png'   => $client->getAsync('/image/png'),
  'jpeg'  => $client->getAsync('/image/jpeg'),
  'webp'  => $client->getAsync('/image/webp')
];

// Wait on all of the requests to complete. Throws a ConnectException
// if any of the requests fail
$results = Promise\unwrap($promises);

// Wait for the requests to complete, even if some of them fail
$results = Promise\settle($promises)->wait();

// You can access each result using the key provided to the unwrap
// function.
echo $results['image']['value']->getHeader('Content-Length')[0]
echo $results['png']['value']->getHeader('Content-Length')[0]

Before you start doing anything with HTTP, I would suggest taking a hard look at how your app intends to interact with data, and how that data fetching can be completely separated from your render logic. WordPress has a bad habit of mixing user functions that create network requests with PHP output to do what some would call “templating.” Ideally, all of your network requests will have completed by the time you start using data in your templates.

In addition to batching, almost every HTTP response is worthy of a cache entry – preferably one that is long-lived. Most responses are for data that never changes, or does so rarely. Data that might need to be refreshed can be given a short expiration. I would think in minutes and hours where possible. Our default cache policy at eMusic for catalog data was 6 hours. The data only refreshed every 24. If your cache size grows beyond its limit, entries will be passively invalidated. You can play with this to perhaps size your cache in way that even entries with no expiration will routinely get rotated.

A simple cache for an HTTP request might look like:

$response = wp_cache_get( 'meximelt', 'taco_bell' );
if ( false === $response ) {
  $endpoint = 'http://tacobell.com/api/meximelt?fire-sauce';
  $response = http_function_that_is_not_wp_remote_get( $endpoint );
  wp_cache_set( 'meximelt', $response, 'taco_bell' );
}

How your app generates its cache entries is probably its bottleneck

The key take-away from the post up until now is: how your app generates its cache entries is probably its bottleneck. If all of your network requests are cached, and especially if you are using an edge cache like Varnish, you are doing pretty well. It is in those moments that your app needs to regenerate a page, or make a bunch of requests over the network, that you will experience the most pain. WordPress is not fully prepared to help you in these moments.

I would like to walk through a few common scenarios for those who build enterprise apps and are not beholden to WordPress to provide all of the development tools.

WordPress is not prepared to scale as a microservice

WordPress is supposed to be used as a one-size-fits-all approach to creating a common app with common endpoints. WordPress assumes it is the router, the database abstraction, the thin cache layer, the HTTP client, the rendering engine, and hey why not, the API engine. WordPress does all of these things while also taking responsibility for kicking off “cron jobs” (let’s use these words loosely) and phoning home to check for updates. Because WordPress tries to do everything, its code tries to be everywhere all at once. This is why there are no “modules” in WordPress. There is one module: all of it.

A side effect of this monolith is that none of the tools accomplish everything you might expect. A drawback: because WordPress knows it might have to do many tasks in the lifecycle of one request, it eagerly loads most of its codebase before performing hardly any worthwhile functions. Just “loading WordPress” can take more than 1 second on a server without opcache enabled properly. The cost of loading WordPress at all is high. Another reason it has to load so much code is that the plugin ecosystem has been trained to expect every possible API to be in scope, regardless of what is ever used. This is the opposite of service discovery and modularity.

The reason I bring this up: microservices are just abstractions to hide the implementation details of fetching data from resources. The WordPress REST API, at its base, is just grabbing data from the database for you, at the cost of loading all of WordPress and adding network latency. This is why it is not a good solution for providing data to yourself over HTTP, but a necessary solution for providing your data to another site over HTTP. Read-only data is highly-cacheable, and eventual consistency is probably close enough for jazz in the cache layer.

Wait, what cache layer? Ah, right. There is no HTTP cache in WordPress. And there certainly is no baked-in cache for served responses. wp-content/advanced-cache.php + WP_CACHE allows you another opportunity to bypass WordPress and serve full responses from the cache, but I haven’t tried this with Batcache. If you are even considering being a provider of data to another site and want to remain highly-available, I would look at something like Varnish or Fastly (which is basically Varnish).

A primer for introducing Doctrine

There are several reasons that I need to interact with custom tables on the network of international sites I manage at the Times. Media is a global concept – one set of tables is accessed by every site in the network. Each site has its own set of metadata, to store translations. I want to be able to read the data and have it cached automatically. I want to use an ORM-like API that invalidates my cache for me. I want an API to generate SELECT statements, and I want queries to automatically prime the cache, so that when I do:

$db->query('EXPENSIVE QUERY that will cache itself (also contains item #1)!');
// already in the cache
$db->find(1);

All of this exists in Doctrine. Doctrine is an ORM. Doctrine is powerful. I started out writing my own custom code to generate queries and cache them. I caused bugs. I wrote some unit tests. I found more bugs. After enough headaches, I decided to use an open source tool, widely accepted in the PHP universe as a standard. Rather than having to write unit tests for the APIs themselves, I could focus on using the APIs and write tests for our actual implementation.

  • WordPress is missing an API for SELECT queries

    Your first thought might be: “phooey! we got WP_Query!” Indeed we do. But what about tables that are not named wp_posts… that’s what I thought. There are a myriad of valid reasons that a developer needs to create and interact with a custom database table, tables, or an entirely separate schema. We need to give ourselves permission to break out of the mental prison that might be preventing us from using better tools to accomplish tasks that are not supported by WP out of the box. We are going to talk about Doctrine, but first we are going to talk about Symfony.

  • Symfony

    Symfony is the platonic ideal for a modular PHP framework. This is, of course, debatable, but it is probably the best thing we have in the PHP universe. Symfony, through its module ecosystem, attempts to solve specific problems in discrete ways. You do not have to use all of Symfony, because its parts are isolated modules. You can even opt to just use a lighter framework, called Silex, that gives you a bundle of powerful tools with Rails/Express-like app routing. It isn’t a CMS, but it does make prototyping apps built in PHP easy, and it makes you realize that WP doesn’t attempt to solve technical challenges in the same way.

  • Composer

    You can use Composer whenever you want, with any framework or codebase, without Composer being aware of all parts of your codebase, and without Composer knowing which framework you are using. composer.json is package.json for PHP. Most importantly, Composer will autoload all of your classes for you. require_once 'WP_The_New_Class_I_added.php4' is an eyesore and unnecessary. If you have noticed, WP loads all of its library code in wp-settings.php, whether or not you are ever going to use it. This is bad, so bad. Remember: without a tuned opcache, just loading all of these files can take upwards of 1 second. Bad, so so bad. Loading classes on-demand is the way to go, especially when much of your code only initializes to *maybe* run on a special admin screen or the like.

    Here is how complicated autoloading with Composer is – in wp-config.php, please add:

    require_once 'vendor/autoload.php'
    

    I need a nap.

  • Pimple

    Pimple is the Dependency Injection Container (sometimes labeled without irony as a DIC), also written by Fabien. A DI container allows us to lazy-load resources at runtime, while also keeping their implementation details opaque. The benefit is that we can more simply mock these resources, or swap out the implementation of them, without changing the details of how they are referenced. It also allows us to group functionality in a sane way, and more cleanly expose a suite of services to our overall app container. Finally, it makes it possible to have a central location for importing libraries and declaring services based on them.

    A container is an instance of Pimple, populated by service providers:

    namespace NYT;
    
    use Pimple\Container;
    
    class App extends Container {}
    
    // and then later...
    
    $app = new App();
    $app->register( new AppProvider() );
    $app->register( new Cache\Provider() );
    $app->register( new Database\Provider() );
    $app->register( new Symfony\Provider() );
    $app->register( new AWS\Provider() );
    

    Not to scare you, and this is perhaps beyond the scope of this post, but here is what our Database provider looks like:

    namespace NYT\Database;
    
    use Pimple\Container;
    use Pimple\ServiceProviderInterface;
    use Doctrine\ORM\EntityManager;
    use Doctrine\ORM\Configuration;
    use Doctrine\Common\Persistence\Mapping\Driver\StaticPHPDriver;
    use Doctrine\DBAL\Connection;
    use NYT\Database\{Driver,SQLLogger};
    
    class Provider implements ServiceProviderInterface {
      public function register( Container $app ) {
    
        $app['doctrine.metadata.driver'] = function () {
          return new StaticPHPDriver( [
            __SRC__ . '/lib/php/Entity/',
            __SRC__ . '/wp-content/plugins/nyt-wp-bylines/php/Entity/',
            __SRC__ . '/wp-content/plugins/nyt-wp-media/php/Entity/',
          ] );
        };
    
        $app['doctrine.config'] = function ( $app ) {
          $config = new Configuration();
    
          $config->setProxyDir( __SRC__ . '/lib/php/Proxy' );
          $config->setProxyNamespace( 'NYT\Proxy' );
          $config->setAutoGenerateProxyClasses( false );
    
          $config->setMetadataDriverImpl( $app['doctrine.metadata.driver'] );
          $config->setMetadataCacheImpl( $app['cache.array'] );
    
          $cacheDriver = $app['cache.chain'];
          $config->setQueryCacheImpl( $cacheDriver );
          $config->setResultCacheImpl( $cacheDriver );
          $config->setHydrationCacheImpl( $cacheDriver );
    
          // this has to be on, only thing that caches Entities properly
          $config->setSecondLevelCacheEnabled( true );
          $config->getSecondLevelCacheConfiguration()->setCacheFactory( $app['cache.factory'] );
    
          if ( 'dev' === $app['env'] ) {
            $config->setSQLLogger( new SQLLogger() );
          }
          return $config;
        };
    
        $app['doctrine.nyt.driver'] = function () {
          // Driver calls DriverConnection, which sets the internal _conn prop
          // to the mysqli instance from wpdb
          return new Driver();
        };
    
        $app['db.host'] = function () {
          return getenv( 'DB_HOST' );
        };
    
        $app['db.user'] = function () {
          return getenv( 'DB_USER' );
        };
    
        $app['db.password'] = function () {
          return getenv( 'DB_PASSWORD' );
        };
    
        $app['db.name'] = function () {
          return getenv( 'DB_NAME' );
        };
    
        $app['doctrine.connection'] = function ( $app ) {
          return new Connection(
            // these credentials don't actually do anything
            [
              'host' => $app['db.host'],
              'user' => $app['db.user'],
              'password' => $app['db.password'],
            ],
            $app['doctrine.nyt.driver'],
            $app['doctrine.config']
          );
        };
    
        $app['db'] = function ( $app ) {
          $conn = $app['doctrine.connection'];
          return EntityManager::create( $conn, $app['doctrine.config'], $conn->getEventManager() );
        };
      }
    }	
    

    I am showing you this because it will be relevant in the next 2 sections.

  • wp-content/db.php

    WordPress allows you to override the default database implementation with your own. Rather than overriding wpdb (this is a class, yet eschews all naming conventions), we are going to instantiate it like normal, but leak the actual database resource in our override file so that we can share the connection with our Doctrine code. This gives us the benefit of using the same connection on both sides. In true WP fashion, $dbh is a protected member, and should not be publicly readable, but a loophole in the visibility scheme for the class allows us to anyway – for backwards compatibility, members that were previously initialized with var (whadup PHP4) are readable through the magic methods that decorate the class.

    Here is what we do in the DB drop-in:

    $app = NYT\getApp();
    
    $wpdb = new wpdb(
      $app['db.user'],
      $app['db.password'],
      $app['db.name'],
      $app['db.host']
    );
    
    $app['dbh'] = $wpdb->dbh;
    

    The container allows us to keep our credentials opaque, and allows us to switch out *where* our credentials originate without having to change the code here. The Provider class exposes “services” for our app to use that don’t get initialized until we actually use them. Wouldn’t this be nice throughout 100% of WordPress? The answer is yes, but as long as PHP 5.2 support is a thing, Pimple cannot be used, as it requires closures (the = function () {} bit).

  • WP_Object_Cache

    Doctrine uses a cache as well, so it would be great if we could not only share the database connection, but also share the Memcached instance that is powering the WP_Object_Cache and the internal Doctrine cache.

    Here’s our cache Provider:

    namespace NYT\Cache;
    
    use Pimple\Container;
    use Pimple\ServiceProviderInterface;
    use Doctrine\ORM\Cache\DefaultCacheFactory;
    use Doctrine\ORM\Cache\RegionsConfiguration;
    use Doctrine\Common\Cache\{ArrayCache,ChainCache,MemcachedCache};
    use \Memcached;
    
    class Provider implements ServiceProviderInterface {
      public function register( Container $app ) {
        $app['cache.array'] = function () {
          return new ArrayCache();
        };
    
        $app['memcached.servers'] = function () {
          return [
            [ getenv( 'MEMCACHED_HOST' ), '11211' ],
          ];
        };
    
        $app['memcached'] = function ( $app ) {
          $memcached = new Memcached();
          foreach ( $app['memcached.servers'] as $server ) {
            list( $node, $port ) = $server;
            $memcached->addServer(
              // host
              $node,
              // port
              $port,
              // bucket weight
              1
            );
          }
          return $memcached;
        };
    
        $app['cache.memcached'] = function ( $app ) {
          $cache = new MemcachedCache();
          $cache->setMemcached( $app['memcached'] );
          return $cache;
        };
    
        $app['cache.chain'] = function ( $app ) {
          return new ChainCache( [
            $app['cache.array'],
            $app['cache.memcached']
          ] );
        };
    
        $app['cache.regions.config'] = function () {
          return new RegionsConfiguration();
        };
    
        $app['cache.factory'] = function ( $app ) {
          return new DefaultCacheFactory(
            $app['cache.regions.config'],
            $app['cache.chain']
          );
        };
      }
    }
    
    

    WP_Object_Cache is an idea more than its actual implementation details, and with so many engines for it in the wild, the easiest way to insure we don’t blow up the universe is to have our cache class implement an interface. This will ensure that our method signatures match the reference implementation, and that our class contains all of the expected methods.

    namespace NYT\Cache;
    
    interface WPCacheInterface {
      public function key( $key, string $group = 'default' );
    
      public function get( $id, string $group = 'default' );
      public function set( $id, $data, string $group = 'default', int $expire = 0 );
      public function add( $id, $data, string $group = 'default', int $expire = 0 );
      public function replace( $id, $data, string $group = 'default', int $expire = 0 );
      public function delete( $id, string $group = 'default' );
    
      public function incr( $id, int $n = 1, string $group = 'default' );
      public function decr( $id, int $n = 1, string $group = 'default' );
    
      public function flush();
      public function close();
    
      public function switch_to_blog( int $blog_id );
    
      public function add_global_groups( $groups );
      public function add_non_persistent_groups( $groups );
    }
    

    Here’s our custom object cache that uses the Doctrine cache internals instead of always hitting Memcached directly:

    namespace NYT\Cache;
    
    use NYT\App;
    
    class ObjectCache implements WPCacheInterface {
      protected $memcached;
      protected $cache;
      protected $arrayCache;
    
      protected $globalPrefix;
      protected $sitePrefix;
      protected $tablePrefix;
    
      protected $global_groups = [];
      protected $no_mc_groups = [];
    
      public function __construct( App $app ) {
        $this->memcached = $app['memcached'];
        $this->cache = $app['cache.chain'];
        $this->arrayCache = $app['cache.array'];
    
        $this->tablePrefix = $app['table_prefix'];
        $this->globalPrefix = is_multisite() ? '' : $this->tablePrefix;
        $this->sitePrefix = ( is_multisite() ? $app['blog_id'] : $this->tablePrefix ) . ':';
      }
    
      public function key( $key, string $group = 'default' ): string
      {
        if ( false !== array_search( $group, $this->global_groups ) ) {
          $prefix = $this->globalPrefix;
        } else {
          $prefix = $this->sitePrefix;
        }
        return preg_replace( '/\s+/', '', WP_CACHE_KEY_SALT . "$prefix$group:$key" );
      }
    
      /**
       * @param int|string $id
       * @param string     $group
       * @return mixed
       */
      public function get( $id, string $group = 'default' ) {
        $key = $this->key( $id, $group );
        $value = false;
    
        if ( $this->arrayCache->contains( $key ) && in_array( $group, $this->no_mc_groups ) ) {
          $value = $this->arrayCache->fetch( $key );
        } elseif ( in_array( $group, $this->no_mc_groups ) ) {
          $this->arrayCache->save( $key, $value );
        } else {
          $value = $this->cache->fetch( $key );
        }
    
        return $value;
      }
    
      public function set( $id, $data, string $group = 'default', int $expire = 0 ): bool
      {
        $key = $this->key( $id, $group );
        return $this->cache->save( $key, $data, $expire );
      }
    
      public function add( $id, $data, string $group = 'default', int $expire = 0 ): bool
      {
        $key = $this->key( $id, $group );
        if ( in_array( $group, $this->no_mc_groups ) ) {
          $this->arrayCache->save( $key, $data );
          return true;
        } elseif ( $this->arrayCache->contains( $key ) && false !== $this->arrayCache->fetch( $key ) ) {
          return false;
        }
    
        return $this->cache->save( $key, $data, $expire );
      }
    
      public function replace( $id, $data, string $group = 'default', int $expire = 0 ): bool
      {
        $key = $this->key( $id, $group );
        $result = $this->memcached->replace( $key, $data, $expire );
    
        if ( false !== $result ) {
          $this->arrayCache->save( $key, $data );
        }
    
        return $result;
      }
    
      public function delete( $id, string $group = 'default' ): bool
      {
        $key = $this->key( $id, $group );
        return $this->cache->delete( $key );
      }
    
      public function incr( $id, int $n = 1, string $group = 'default' ): bool
      {
        $key = $this->key( $id, $group );
        $incr = $this->memcached->increment( $key, $n );
        return $this->cache->save( $key, $incr );
      }
    
      public function decr( $id, int $n = 1, string $group = 'default' ): bool
      {
        $key = $this->key( $id, $group );
        $decr = $this->memcached->decrement( $key, $n );
        return $this->cache->save( $key, $decr );
      }
    
      public function flush(): bool
      {
        if ( is_multisite() ) {
          return true;
        }
    
        return $this->cache->flush();
      }
    
      public function close() {
        $this->memcached->quit();
      }
    
      public function switch_to_blog( int $blog_id ) {
        $this->sitePrefix = ( is_multisite() ? $blog_id : $this->tablePrefix ) . ':';
      }
    
      public function add_global_groups( $groups ) {
        if ( ! is_array( $groups ) ) {
          $groups = [ $groups ];
        }
        $this->global_groups = array_merge( $this->global_groups, $groups );
        $this->global_groups = array_unique( $this->global_groups );
      }
    
      public function add_non_persistent_groups( $groups ) {
        if ( ! is_array( $groups ) ) {
          $groups = [ $groups ];
        }
        $this->no_mc_groups = array_merge( $this->no_mc_groups, $groups );
        $this->no_mc_groups = array_unique( $this->no_mc_groups );
      }
    }
    

    object-cache.php is the PHP4-style function set that we all know and love:

    use NYT\Cache\ObjectCache;
    use function NYT\getApp;
    
    function _wp_object_cache() {
      static $cache = null;
      if ( ! $cache ) {
        $app = getApp();
        $cache = new ObjectCache( $app );
      }
      return $cache;
    }
    
    function wp_cache_add( $key, $data, $group = '', $expire = 0 ) {
      return _wp_object_cache()->add( $key, $data, $group, $expire );
    }
    
    function wp_cache_incr( $key, $n = 1, $group = '' ) {
      return _wp_object_cache()->incr( $key, $n, $group );
    }
    
    function wp_cache_decr( $key, $n = 1, $group = '' ) {
      return _wp_object_cache()->decr( $key, $n, $group );
    }
    
    function wp_cache_close() {
      return _wp_object_cache()->close();
    }
    
    function wp_cache_delete( $key, $group = '' ) {
      return _wp_object_cache()->delete( $key, $group );
    }
    
    function wp_cache_flush() {
      return _wp_object_cache()->flush();
    }
    
    function wp_cache_get( $key, $group = '' ) {
      return _wp_object_cache()->get( $key, $group );
    }
    
    function wp_cache_init() {
      global $wp_object_cache;
      $wp_object_cache = _wp_object_cache();
    }
    
    function wp_cache_replace( $key, $data, $group = '', $expire = 0 ) {
      return _wp_object_cache()->replace( $key, $data, $group, $expire );
    }
    
    function wp_cache_set( $key, $data, $group = '', $expire = 0 ) {
      if ( defined( 'WP_INSTALLING' ) ) {
        return _wp_object_cache()->delete( $key, $group );
      }
      return _wp_object_cache()->set( $key, $data, $group, $expire );
    }
    
    function wp_cache_switch_to_blog( $blog_id ) {
      return _wp_object_cache()->switch_to_blog( $blog_id );
    }
    
    function wp_cache_add_global_groups( $groups ) {
      _wp_object_cache()->add_global_groups( $groups );
    }
    
    function wp_cache_add_non_persistent_groups( $groups ) {
      _wp_object_cache()->add_non_persistent_groups( $groups );
    }
    
    
  • All of the above is boilerplate. Yes, it sucks, but once it’s there, you don’t really need to touch it again. I have included it in the post in case anyone wants to try some of this stuff out on their own stack. So, finally:

    Show Me Doctrine! This is an Entity – we will verbosely describe an Asset, but as you can probably guess, we get a kickass API for free by doing this:

    namespace NYT\Media\Entity;
    
    use Doctrine\ORM\Mapping\ClassMetadata as DoctrineMetadata;
    use Symfony\Component\Validator\Mapping\ClassMetadata as ValidatorMetadata;
    use Symfony\Component\Validator\Constraints as Assert;
    
    class Asset {
      /**
       * @var int
       */
      protected $id;
      /**
       * @var int
       */
      protected $assetId;
      /**
       * @var string
       */
      protected $type;
      /**
       * @var string
       */
      protected $slug;
      /**
       * @var string
       */
      protected $modified;
    
      public function getId() {
        return $this->id;
      }
    
      public function getAssetId() {
        return $this->assetId;
      }
    
      public function setAssetId( $assetId ) {
        $this->assetId = $assetId;
      }
    
      public function getType() {
        return $this->type;
      }
    
      public function setType( $type ) {
        $this->type = $type;
      }
    
      public function getSlug() {
        return $this->slug;
      }
    
      public function setSlug( $slug ) {
        $this->slug = $slug;
      }
    
      public function getModified() {
        return $this->modified;
      }
    
      public function setModified( $modified ) {
        $this->modified = $modified;
      }
    
      public static function loadMetadata( DoctrineMetadata $metadata ) {
        $metadata->enableCache( [
          'usage' => $metadata::CACHE_USAGE_NONSTRICT_READ_WRITE,
          'region' => static::class
        ] );
    
        $metadata->setIdGeneratorType( $metadata::GENERATOR_TYPE_IDENTITY );
    
        $metadata->setPrimaryTable( [
          'name' => 'nyt_media'
        ] );
    
        $metadata->mapField( [
          'id' => true,
          'fieldName' => 'id',
          'type' => 'integer',
        ] );
    
        $metadata->mapField( [
          'fieldName' => 'assetId',
          'type' => 'integer',
          'columnName' => 'asset_id',
        ] );
    
        $metadata->mapField( [
          'fieldName' => 'type',
          'type' => 'string',
        ] );
    
        $metadata->mapField( [
          'fieldName' => 'slug',
          'type' => 'string',
        ] );
    
        $metadata->mapField( [
          'fieldName' => 'modified',
          'type' => 'string',
        ] );
      }
    
      public static function loadValidatorMetadata( ValidatorMetadata $metadata ) {
        $metadata->addGetterConstraints( 'assetId', [
          new Assert\NotBlank(),
        ] );
      }
    }
    

Now that we have been exposed to some new nomenclature, and a bunch of frightening code, let’s look at what we now get. First we need a Media Provider:

namespace NYT\Media;

use NYT\{App,LogFactory};
use Pimple\Container;
use Pimple\ServiceProviderInterface;

class Provider implements ServiceProviderInterface {
  public function register( Container $app ) {
    .....

    $app['media.repo.asset'] = function ( $app ) {
      return $app['db']->getRepository( Entity\Asset::class );
    };

    $app['media.repo.post_media'] = function ( $app ) {
      return $app['db']->getRepository( Entity\PostMedia::class );
    };
  }
}

Now we want to use this API to do some powerful stuff:

// lazy-load repository class for media assets
$repo = $app['media.repo.asset'];

// find one item by primary key, will prime cache
$asset = $repo->find( $id );

We can prime the cache whenever we want by requesting assets filtered by params we pass to ->findBy( $params ). I hope you’ve already figured it out, but we didn’t have to write any of this magic code, it is exposed automatically. Any call to ->findBy() will return from the cache or generate a SQL query, cache the resulting IDs by a hash, and normalize the cache by creating an entry for each found item. Subsequent queries have access to the same normalized cache, so queries can be optimized, prime each other, and invalidate entries when mutations occur.

// pull a bunch of ids from somewhere and get a bunch of assets at once
// will prime cache
$repo->findBy( [ 'id' => $ids ] );

// After priming the cache, sanity check items
$assets = array_reduce( $ids, function ( $carry, $id ) use ( $repo ) {
  // all items will be read from the cache
  $asset = $repo->find( $id );
  if ( $asset ) {
    $carry[] = $asset;
  }
  return $carry;
}, [] );

$repo->findBy( [ 'slug' => $oneSlug ] );
$repo->findBy( [ 'slug' => $manySlugs ] );

Here’s an example of a WP_Query-type request with 10 times more horsepower underneath:

// get a page of assets filtered by params
$assets = $repo->findBy(
  $params,
  [ 'id' => $opts['order'] ],
  $opts['perPage'],
  ( $opts['page'] - 1 ) * $opts['perPage']
);

With Doctrine, you never need to write SQL, and you certainly don’t need to write a Cache layer.

Lesson Learned

Let me explain a complex scenario where the cache priming in Doctrine turned a mess of a problem into an easy solution. The media assets used on our network of internationalized sites do not live in WordPress. WordPress has reference to the foreign IDs in the database in a custom table: nyt_media. We also store a few other pieces of identifying data, but only to support the List Table experience in the admin (we write custom List Tables, another weird nightmare of an API). The media items are referenced in posts via a [media] shortcode. Shortcodes have their detractors, but the Shortcode API, IMO, is a great way to store contextually-placed object references in freeform content.

The handler for the media shortcode does A LOT. For each shortcode, we need to request an asset from a web service over HTTP or read it from the cache. We need to read a row in the database to get the id to pass to the web service. We need to read site-specific metadata to get the translations for certain fields that will override the fields returned from the web service.

To review, for each item [media id="8675309"]:
1. Go to the database, get the nyt_media row by primary key (8675309)
2. Request the asset from a web service, using the asset_id returned by the row
3. Get the metadata from nyt_2_mediameta by nyt_media_id (8675309)
4. Don’t actually make network requests in the shortcode handler itself
5. Read all of these values from primed caches

Shortcodes get parsed when 'the_content' filters runs. To short circuit that, we hook into 'template_redirect', which happens before rendering. We match all of the shortcodes in the content for every post in the request – this is more than just the main loop, this could also be Related Content modules and the like. We build a list of all of the unique IDs that will need data.

Once we have all of the IDs, we look in the cache for existing entries. If we parsed 85 IDs, and 50 are already in the cache, we will only request the missing 35 items. It is absolutely crucial for the HTTP portion that we generate a batch query and that no single requests leak out. We are not going to make requests serially (one at a time), our server would explode and time out. Because we are doing such amazing work so far, and we are so smart about HTTP, we should be fine, right?

Let us open our hymnals to my tweet-storm from Friday:

I assumed all primary key lookups would be fast and free. Not so.

Let’s look at these lines:

// ids might be a huge array
$assets = array_reduce( $ids, function ( $carry, $id ) use ( $repo ) {
  // all items will not be read from the cache
  // we did not prime the cache yet
  $asset = $repo->find( $id );
  if ( $asset ) {
    $carry[] = $asset;
  }
  return $carry;
}, [] );

// later on in the request
// we request the bylines (authors) for all of the above posts

A ton of posts, each post has several media assets. We also store bylines in a separate table. Each post was individually querying for its bylines. The cache was not primed.

When the cache is primed, this request has a chance of being fast-ish, but we are still hitting Memcached a million times (I haven’t gotten around to the batch-requesting cache lookups).

The Solution

Thank you, Doctrine. Once I realized all of the cache misses stampeding my way, I added calls to prime both caches.

// media assets (shortcodes)
// many many single lookups reduced to one
$repo->findBy( [ 'id' => $ids ] );

Remember, WP_Query always makes an expensive query. Doctrine has a query cache, so this call might result in zero database queries. When it does make a query, it will prime the cache in a normalized fashion that all queries can share. Winning.

For all of those bylines, generating queries in Doctrine remains simple. Once I had a list of master IDs:

// eager load bylines for all related posts (there were many of these)
$app['bylines.repo.post_byline']->findBy(
  [
    'postId' => array_unique( $ids ),
    'blogId' => $app['blog_id'],
  ]
);

Conclusion

Performance is hard. Along the way last week, I realized the the default Docker image for PHP-FPM does not enable the opcache by default for PHP 7. Without it being on, wp-settings.php can take more than 1 second to load. I turned on the opcache. I will prime my caches from now on. I will not take primary key lookups for granted. I will consistently profile my app. I will use power tools when necessary. I will continue to look outside of the box and at the PHP universe as a whole.

Final note: I don’t even work on PHP that much these days. I have been full throttle Node / React / Relay since the end of last year. All of these same concepts apply to the Node universe. Your network will strangle you. Your code cannot overcome serial network requests. Your cache must be smart. We are humans, and this is harder than it looks.

Rethinking Blogs at The New York Times

The New York Times

See Also: The Technology Behind the NYTimes.com Redesign

The Blogs at the Times have always run on WordPress. The New York Times, as an ecosystem, does not run on one platform or one technology. It runs on several. There are over 150 developers at the Times split across numerous teams: Web Products, Search, Blogs, iOS, Android, Mobile Web, Crosswords, Ads, BI, CMS, Video, APIs, Interactive News, and the list goes on. While PHP is frequently used, Elastic Search and Node make an appearance, and the Newspaper CMS, “Scoop,” is written in Java. Interactive likes Ruby/Rails.

The “redesign,” which launched last week, was really a re-platform: where Times development needs to head, and a rethinking of our development processes and tools. The customer-facing redesign was 2 main pieces:

  • a new Article “app” that runs inside of our new platform
  • the “reskinning” of our homepage and section fronts

What is launching today is the re-platform of Blogs from a WordPress-only service to Blogs via WordPress as an app inside of our new platform.

The Redesign

Most people who use the internet have visited an NYTimes article page –

the old design:
http://www.nytimes.com/2013/12/29/arts/music/lordes-royals-is-class-conscious.html

Lorde

the new:
http://www.nytimes.com/2014/01/15/arts/music/jay-z-offers-a-view-of-his-legacy-at-barclays-center.html?ref=music

Jay-Z at Barclay's

What is not immediately obvious to the reader is how all of this works behind the scenes.

Non-Technical

To skip past all of the technical details, click here:

How Things Used to Work

For many years at the Times, article pages were generated into static HTML files when published. This was good and bad. Good because: static files are lightning fast to serve. Bad because: those files point at static assets (CSS, JavaScript files) that can only change when the pages are re-generated and re-published. One way around this was to load a CSS file that had a bunch of @import statements (eek), with a similar loading scheme for JS (even worse).

Blogs used to load like any custom WordPress project:

  • configured as a Multisite install (amassing ~200 blogs over time)
  • lots of custom plugins and widgets
  • custom themes + a few child themes

A lot of front-end developers also write PHP and vice versa. At the Times, in many instances, the team working on the Blogs “theme” was not the same team working on the CSS/JS. So, we would have different Subversion repos for global CSS, blogs CSS; different repos for global JS, blogs JS; and a different repo for WordPress proper. When I first started working at the Times, I had to create a symlink farm of 7 different repos that would represent all of the JS and CSS that blogs were using. Good times.

On top of that, all blogs would inherit NYTimes “global” styles and scripts. A theme would end up inheriting global styles for the whole project, global styles for all blogs, and then sometimes, a specific stylesheet for the individual blog. For CSS, this would sometimes result in 40-50 (sometimes 80!) stylesheets loading. Not good.

WordPress would load jQuery, Prototype, and Scriptaculous with every request (I’m pretty sure some flavor of jQuery UI was in there too). As a result, every module within the page would just assume that our flavor of jQuery global variable NYTD.jQuery was available anywhere, and would assume that Prototype.js code could be called at will. (Spoiler alert: that was a bad idea.)

WordPress does not use native WP comments. There is an entire service at the Times called CRNR (Comments, Ratings, and Reviews) that has its own user management, taxonomy management, and community moderation tools. Modules like “CRNR” would provide us with code to “drop onto the page.” Sometimes this code included its own copy of jQuery, different version and all.

Widgets on blogs could be tightly coupled with the WordPress codebase, or they could be some code that was pasted into a freeform textarea from some other team. The Interactive News team at the Times would sometimes supply us code to “drop into the C-Column” – translation: add a widget to the sidebar. These “interactives” would sometimes include their own copy jQuery (what version…? who knows!).

How Things Work Now

The new platform has 2 main technologies at its center: the homegrown Madison Framework (PHP as MVC), and Grunt, the popular task runner than runs on Node. Our NYT codebase is a collection of several Git repos that get built into apps via Grunt and deployed by RPMs/Puppet. For any app that wants to live inside of the new shell (inherit the masthead, “ribbon,” navigation automatically), they must register their existence. After they do, they can “inherit” from other projects. I’ll explain.

Foundation

Foundation is the base application. Foundation contains the Madison PHP framework, the Magnum CSS/Responsive framework, and our base JavaScript framework. Our CSS is no longer a billion disparate files – it is LESS manifests, with plenty of custom mixins, that compile into a few CSS files. At the heart of our JS approach is RequireJS, Hammer, SockJS and Backbone (authored by Times alum Jeremy Ashkenas).

Madison is an MVC framework that utilizes the newest and shiniest OO features of PHP and is built around 2 main software design patterns: the Service Locator pattern (via Pimple), and Dependency Injection. The main “front” of any request to the new stack goes through Foundation, as it contains the main controller files for the framework. Apps register their main route via Apache rewrite rules, Madison knows which app to launch by convention based on the code that was deployed via the Grunt build.

Shared

Shared is collection of reusable modules. Write a module once, and then allow apps to include them at-will. Shared is where Madison’s “base” modules exist. Modules are just PHP template fragments which can include other PHP templates. Think of a “Page” module like so:

Page
- load Top module
- load Content module
- load Bottom module

Top (included in Page)
- load Styles module
- load Scripts module
- load Meta module

...

In your app code, if you try to embed a module by name, and it isn’t in your app’s codebase, the framework will automatically look for it in Shared. This is similar to how parent and child themes work in WordPress. This means: if you want to use ALL of the default modules, only overriding a few, you need to only specify the overriding modules in your app. Let’s say the main content of the page is a module called “PageContent/Thing” – you would include the following in your app to override what is displayed:

// page layout
$layout = array(
    'type' => 'Page',
    'name' => 'Page',
    'modules' => array(
        array(
            'type' => 'PageContent',
            'name' => 'Thing'
        ),
        .....
    )
);

// will first look in
nyt5-app-blogs/Modules/PageContent/Thing.tpl.php
// if it doesn't find it
nyt5-shared/PageContent/php/src/Thing.tpl.php

So there’s a lot happening, before we even get to our Blogs app, and we haven’t even really mentioned WordPress yet!

App-specific

Each app contains a build.json file that explains how to turn our app into a codebase that can be deployed as an application. Each app might also have the following folder structure:

js/
js/src
js/tests
less/
php/
php/src
php/tests

Our build.json files lists our LESS manifests (the files to build via Grunt) and our JS mainifests (the files to parse using r.js/Require). Our php/src directory contains the following crucial pieces:

Module/ <-- contains our Madison override templates
WordPress/ <-- contains our entire WP codebase
ApplicationConfiguration.php <-- optional configuration
ApplicationController.php <-- the main Controller for our app
wp-bootstrap.php <-- loads in global scope to load/parse WordPress

The wp-bootstrap.php file is the most interesting portion of our WordPress app, and where we do the most unconventional work to get these 2 disparate frameworks to work together. Before we even load our app in Madison proper, we have already loaded all of WordPress in an output buffer and stored the result. We can then access that result in our Madison code without any knowledge of WordPress. Alternately, we can use any WP code inside of Madison. Madison eschews procedural programming and enforces namespace-ing for all classes, so collisions haven’t happened (yet?).

Because we are turning WP content in Module content, we no longer want our themes to produce complete HTML documents: we only to produce the “content” of the page. Our Madison page layout gives us a wrapper and loads our app-specific scripts and styles. We have enough opportunities to override default template stubs to inject Blog-specific content where necessary.

In the previous incarnation of Blogs, we had to include tons of global scripts and styles. Using RequireJS, which leans on Dependency Injection, we ask for jQuery in any module and ensure that it only loads once. If we in fact do need a separate version somewhere, we can be assured that we aren’t stomping global scope, since we aren’t relying on global scope.

Using LESS imports instead of CSS file imports, we can modularize our code (even using 80 files if we want!) and combine/minify on build.

Loading WordPress in our new unconventional way lets us work with other teams and other code seamlessly. I don’t need to include the masthead/navigation markup in my theme. I don’t even need to know how it works. We can focus on making blogs work, and inherit the rest.

What I Did

For the first few months of the project, I was able to work in isolation and move the Blogs codebase from SVN to Git. I was happy that we were moving the CSS to LESS and the JS to Require/Backbone, so I took all of the old files and converted them into those modern frameworks. The Times had 3 themes that I was given free reign to rewrite and squish into one lighter, more flexible theme. Since the Times has been using WordPress since 2005, there was code from the dark ages of the internet that I was able to look at with fresh eyes and transition. Once a lot of the brute force initial work was done, I worked with a talented team of people to integrate some of the Shared components and make sure we had stylistic parity between the new Article pages and Blogs.

To see some examples in action, a sampling:

Dealbook

Bits

Well

The Lede

City Room

ArtsBeat

Public Editor’s Journal

Paul Krugman

Installing PHP 5.4 (like a boss) with MacPorts

PHP 5.4.7 is the latest stable release of PHP. WordPress has a minimum required version of 5.2.6. Most developers aren’t using the PHP 5.4 branch. Actually, most aren’t even rocking PHP 5.3. This disgusts me.

PHP 5.3 added support for closures. If you come from the world of JavaScript, you know how useful they can be. If you have used PHP 5.3 and closures in classes, you will be happy to know that PHP 5.4 allows you to use $this in closures in class methods.

If you haven’t messed around with PHP 5.3, you can install these MacPorts to get started:

php5 +apache2+fastcgi+pear
php5-apc
php5-curl
php5-gd
php5-http
php5-iconv
php5-imagick
php5-mbstring
php5-mcrypt
php5-memcached
php5-mysql +mysqlnd
php5-openssl
php5-tidy

If you already have PHP 5.3 and want to upgrade to PHP 5.4, these are some tricks to get you on the right path:

sudo -s // use sudo mode throughout

port uninstall php5
// won't work if you have extensions installed,
// so uninstall everything that has PHP5 as a dependency first

port install php54
cd /opt/local/etc/php54 && sudo cp php.ini-development php.ini

Install a bunch of PHP extensions:

port install php54-apc php54-curl php54-gd php54-http php54-iconv php54-imagick php54-mbstring php54-mcrypt php54-memcached php54-mysql php54-openssl php54-tidy

To use mysqlnd with a local MySQL server, edit /opt/local/etc/php54/php.ini and set

mysql.default_socket, mysqli.default_socket and pdo_mysql.default_socket
to

/opt/local/var/run/mysql5/mysqld.sock

Make sure PHP 5.4.6 is the default PHP binary:

which php

If it’s something like /usr/bin/php:

cd /usr/bin && sudo rm -rf php
sudo ln -s /opt/local/bin/php54 php

You now have PHP 5.4.6 and your extensions, but you no longer have the apache variant.

port install php54-apache2handler

cd /opt/local/apache2/modules
sudo /opt/local/apache2/bin/apxs -a -e -n php5 mod_php54.so

vi /opt/local/apache2/conf/httpd.conf (remove the old php5.so)

You now have PHP 5.4 and the apache handler, but you no longer have the PEAR variant. You can try to make this work:

port install pear-PEAR

Or you can do the following:

cd #
curl http://pear.php.net/go-pear.phar -o go-pear.phar
sudo php go-pear.phar

You will prompted to specify config vars, we want to change #1 and #4.

Press 1 – Installation base ($prefix) – and enter:

/opt/local/lib/php54

Press 4 – Binaries directory – and enter:

/opt/local/bin

More checks for PEAR:

pear info pear && rm go-pear.phar
pear config-set auto_discover 1

// make sure PEAR is in the PHP include path
pear config-get php_dir

// if you don't see "/opt/local/lib/php54/share/pear" in there
php54 -i|grep 'php.ini'
// you should see "/opt/local/etc/php54" - if you don't:
sudo vi php.ini
// change include_path to:
include_path = ".:/opt/local/lib/php54/share/pear"

PEAR is installed, let’s install some PEAR stuffs:

// Unit tests
pear install pear.phpunit.de/PHPUnit
// Documentation generator
pear install pear.apigen.org/apigen

Restart Apache:

sudo /opt/local/apache2/bin/apachectl restart

You can start Apache and Memcached, et al by using commands like:

sudo port load apache2
sudo port unload apache2

sudo port load memcached
sudo port unload memcached

// memcached debugging, start with:
memcached -vv

WP + You + OOP

A lot of the programming associated with WordPress is inherently not object-oriented. Most of the theme mods or plugins that developers write for their blog(s) are one-off functions here or there, or a small set of filters and actions with associated callback functions. As WordPress matures into an “application framework” (everyone else’s words, not mine), the need for better code organization, greater maintainability, and the self-documenting powers of Object Oriented Programming become immediately apparent.

Because a majority of WordPress sites aren’t complex, the audience for discussions like this are small. Probably 99% of WordPress installs don’t need extra PHP code for them to work how their site owner(s) want. A majority of sites don’t use Multisite. A majority of sites have no need for Web Services, and when they do: they just install some Twitter widget or the like that does the heavy lifting for them. But I don’t think anyone involved with WordPress core wants that to be the future of WordPress. When people talk about the future of WordPress, they talk about how it can run any web application, but there aren’t a large number of compelling examples of WP doing that yet.

Almost by accident, I think eMusic has become a great example of how to not only run WordPress at scale, but how to write a site using WordPress as an application framework, and I have many examples of how we organize our code that can help anyone else who is struggling to make sense of a pile of code that an entire team needs to decipher and maintain.

Before we dig into how to write better OO code, we need to first figure out how we are going to organize our codebase.

Some Setups Tips

  • Run WordPress as an svn:external: This should almost be mandatory. You want your directory structure to look like so:
    /index.php
    /wp-config.php
    /wordpress
    /wp-content
    /wp-content/themes
    /wp-content/plugins
    /wp-content/mu-plugins
    /wp-content/requests
    /wp-content/site-configs
    /wp-content/sunrise.php
    
    // in the root of your site
    svn propedit svn:externals .
    
    // add this code
    wordpress http://core.svn.wordpress.org/branches/3.4/

    This is important so that you never overwrite core, and so you can’t check-in whatever hacks you have added while debugging core code.

    Because we are using an external, you need to add these lines to wp-config.php:

    define( 'WP_CONTENT_URL', 'http://' . DOMAIN_CURRENT_SITE . '/wp-content' );
    define( 'WP_CONTENT_DIR', $_SERVER['DOCUMENT_ROOT'] . '/wp-content' );

    You also need to alter index.php to look like this:

    define( 'WP_USE_THEMES', true );
    
    /** Loads the WordPress Environment and Template */
    require( './wordpress/wp-blog-header.php' );
  • Use Sunrise:
    If you are using Multisite, Sunrise is the best super-early place to hook in and alter WordPress. In wp-config.php:

    define( 'SUNRISE', 1 );

    You then need to a file in wp-content called sunrise.php.

  • Use Site Configs:
    One of the main things we use sunrise.php for is site-specific configuration code. I made a folder in wp-content called site-configs that houses files like global.php (all sites), emusic.php (site #1), emusic-bbpress.php (site #2), etc
  • Separate HTTP Requests:
    I made a folder in wp-config called requests that houses site and page-specific HTTP logic. Because a big portion of our main site is dynamic and populates its data from Web Services, it makes sense to organize all of that logic in one place.
  • Use “Must Use” plugins:
    If you have classes or code that are mandatory for your application, you can autoload them by simply placing each file in your wp-content/mu-plugins folder. If your plugin requires a bunch of extra files: it is not a good candidate for mu-plugins.

Use classes to encapsulate plugins and theme configs

MOST plugins in the WordPress plugin repository are written using procedural code – meaning, a bunch of function and global variables (whadup, Akismet!). Hopefully you know enough about programming to know that is a bad idea. 1) Global variables suck and are easily over-writable and 2) PHP will throw a fatal error if you try to overload a function (declare a function with the same name twice).

Because you have to protect your function names against this, most procedural plugin authors namespace their functions by prepending an identifier to their function names:

function my_unique_plugin_name_woo_hoo( ) {
    return 'whatever';
}
// call me maybe
my_unique_plugin_name_woo_hoo( );

If you declare a bunch of function callbacks for actions and filters in your plugin, you can see how this would be gross:

function my_unique_plugin_name_alter_the_content() { ... }
add_filter( 'the_content', 'my_unique_plugin_name_alter_the_content' );

function my_unique_plugin_name_red() { ... }
add_filter( 'the_content', 'my_unique_plugin_name_red' );

function my_unique_plugin_name_green() { ... }
add_filter( 'the_content', 'my_unique_plugin_name_green' );

function my_unique_plugin_name_blue() { ... }
add_filter( 'the_content', 'my_unique_plugin_name_blue' );

If we instead use an OO approach, we can add all of our functions to a class as methods, ditch the namespacing, and group our filters and actions together into one “init” method.

class MyUniquePlugin {
    function init() {
        add_filter( 'the_content', array( $this, 'alter_the_content' ) );
        add_filter( 'the_content', array( $this, 'red' ) );
        add_filter( 'the_content', array( $this, 'green' ) );
        add_filter( 'the_content', array( $this, 'blue' ) );
    }

    function alter_the_content() { ... }
    function red() { ... }
    function green() { ... }
    function blue() { ... }
}

How we call this class is up for debate, and I will discuss this at length later. But for right now, let’s call it like this:

$my_unique_plugin = new MyUniquePlugin;
$my_unique_plugin->init();

The init method is used like a constructor, but it ditches any ambiguity between __construct() and class MyPlugin { function MyPlugin() {} }. Could you use __construct() in any of my examples throughout instead of init() now that we are all in PHP5 syntax heaven? Probably. However, we don’t really want to use either, because we don’t want to give anyone the impression that our plugin classes can be called at will. In almost every situation, plugin classes should only be called once, and this rule should be enforced in code. I’ll show you how.

The Singleton Pattern

The Singleton Pattern is one of the GoF (Gang of Four) Patterns. This particular pattern provides a method for limiting the number of instances of an object to just one.

class MySingletonClass {
    private static $instance;
    private function __construct() {}
    public static function get_instance() {
        if ( !isset( self::$instance ) )
            self::$instance = new MySingletonClass();

        return self::$instance;
    }
}

MySingletonClass::get_instance();

Why do we care about limiting the number of instances to one? Think about a class that encapsulates code used to connect to a database. If the database connection is made in the constructor, we should share that connection across all instances of the class, we shouldn’t try to open a connection every time we need to make a SQL query.

For WordPress plugin classes, we want to store all of our actions and filters in a constructor or a class’s init() method. We don’t want to register those filters and actions more than once. We also don’t need or want multiple instances of our plugin class. This makes a plugin class a perfect candidate to implement the Singleton pattern.

class MyUniquePlugin {
    private static $instance;
    private function __construct() {}
    public static function get_instance() {
        if ( !isset( self::$instance ) ) {
            $c = __CLASS__;
            self::$instance = new $c();
        }

        return self::$instance;
    }

    function init() {
        add_filter( 'the_content', array( $this, 'alter_the_content' ) );
        add_filter( 'the_content', array( $this, 'red' ) );
        add_filter( 'the_content', array( $this, 'green' ) );
        add_filter( 'the_content', array( $this, 'blue' ) );
    }

    function alter_the_content() { ... }
    function red() { ... }
    function green() { ... }
    function blue() { ... }
}

MyUniquePlugin::get_instance();

// or, store the value of the class invocation
// to call public methods later
$my_plugin = MyUniquePlugin::get_instance();

Ok, so great, we implemented Singleton and limited our plugin to only one instance. We want to do this for all of our plugins, but it would be great if there was a way to not repeat code in every plugin, namely all of the class members / methods needed to implement Singleton:

    private static $instance;
    private function __construct() {}
    public static function get_instance() {
        if ( !isset( self::$instance ) ) {
            $c = __CLASS__;
            self::$instance = new $c();
        }

        return self::$instance;
    }

To do so, we are going to need a base or intermediate class that can be extended. Here is a base class, but we have a few problems:

class BaseSingleton {
    // this won't work, since $instance will get overwritten
    // every time BaseSingleton is instantiated by a sub-class
    private static $instance;

    // this won't work, because the child class
    // needs to be able to call parent::__construct,
    // meaning the parent constructor has to be as visible
    // as the child - the child has to have >= visibility
    private function __construct() {}

    public static function get_instance() {
        if ( !isset( self::$instance ) ) {
            // this won't work, because __CLASS__ refers
            // to BaseSingleton,
            // not the class extending it at runtime
            $c = __CLASS__;
            self::$instance = new $c();
        }

        return self::$instance;
    }
}

Let’s try to fix is:

class BaseSingleton {
    // store __CLASS__ = (instance of class) as key => value pairs
    private static $instance = array();

    // let the extending class call the constructor
    protected function __construct() {}

    public static function get_instance( $c = '' ) {
        if ( empty( $c ) ) 
            die( 'Class name is required' );
        if ( !isset( self::$instance[$c] ) )
            self::$instance[$c] = new $c();

        return self::$instance[$c];
    }
}

We’re getting closer, but we have some work to do. We are going to use OO features of PHP and some new stuff in PHP 5.3 to make a base class that implements Singleton and works the way we want (we don’t want to do this hack: ClassName::get_instance( 'ClassName' ) ).

Abstract Classes

Abstract classes are kinda like Interfaces:

Classes defined as abstract may not be instantiated, and any class that contains at least one abstract method must also be abstract. Methods defined as abstract simply declare the method’s signature – they cannot define the implementation.

When inheriting from an abstract class, all methods marked abstract in the parent’s class declaration must be defined by the child; additionally, these methods must be defined with the same (or a less restricted) visibility.

Here’s an example:

abstract class BaseClass {
    protected function __construct() {}
    abstract public function init();
}

As we can see, BaseClass does nothing except provide a blueprint for how to write our extending class. Let’s alter it by adding our Singleton code:

abstract class BasePlugin {
    private static $instance = array();
    protected function __construct() {}

    public static function get_instance( $c = '' ) {
        if ( empty( $c ) ) 
            die( 'Class name is required' );
        if ( !isset( self::$instance[$c] ) )
            self::$instance[$c] = new $c();

        return self::$instance[$c];
    }

    abstract public function init(); 
}

Our base class now has the following properties:

  • Declared abstract, cannot be called directly
  • Encapsulates Singleton code
  • Stores class instances in key => value pairs (we’re not done with this)
  • Instructs the extending / child class to define an init() method
  • Hides the constructor, can only be called by itself or a child class

Here’s an example of a plugin extending BasePlugin:

class MyPlugin extends BasePlugin {
    protected function __construct() {
        // our parent class might
        // contain shared code in its constructor
        parent::__construct();
    }

    public function init() {
        // implemented, but does nothing
    }
}
// create the lone instance
MyPlugin::get_instance( 'MyPlugin' );

// store the instance in a variable to be retrieved later:
$my_plugin = MyPlugin::get_instance( 'MyPlugin' );

Here’s is what is happening:

  • MyPlugin is extending BasePlugin, inheriting all of its qualities
  • MyPlugin implements the required abstract function init()
  • MyPlugin cannot be instantiated with the new keyword, the constructor is protected
  • MyPlugin is instantiated with a static method, but because an instance is created, $this can be used throughout its methods

We’re almost done, but we want to call get_instance() without our hack (passing the class name).

Late Static Binding in PHP 5.3

get_called_class() is a function in PHP 5.3 that will give us the name of the child class that is calling a parent class’s method at runtime. The class name will not be resolved using the class where the method is defined but it will rather be computed using runtime information. It is also called a “static binding” as it can be used for (but is not limited to) static method calls.

Here’s an easy example to explain how this works:

class A {
    public static function name() {
        echo get_called_class();
    }
}

class B extends A {}

Class C extends B {}

C::name(); // outputs "C"

get_called_class() is new to PHP 5.3. WordPress only requires PHP 5.2, so you might need to upgrade to be able to implement this class like so:

abstract class BasePlugin {
    private static $instance = array();
    protected function __construct() {}
    public static function get_instance() {
        $c = get_called_class();
        if ( !isset( self::$instance[$c] ) ) {
            self::$instance[$c] = new $c();
            self::$instance[$c]->init();
        }

        return self::$instance[$c];
    }

    abstract public function init();
}

Now we can instantiate our plugin like so:

// create the lone instance
MyPlugin::get_instance();

// store the instance in a variable to be retrieved later:
$my_plugin = MyPlugin::get_instance();

Base Classes for Plugins and Themes

Just like it says: USE BASE CLASSES for plugins AND themes. If you aren’t using Multisite, you probably don’t have the problem of maintaining code across parent and child themes that might be shared for multiple sites. Each theme has a functions.php file, and that file, in my opinion, should encapsulate your theme config code (actions and filters, etc) in a class. Parent and Child themes are similar to base and sub classes.

Because we don’t want to repeat ourselves, code that would be copied / pasted into another theme to inherit functionality should instead be placed in a BaseTheme class. Our theme config classes should also implement Singleton.

One of the pieces of functionality we need to share across themes at eMusic is regionalization. Regionalization is accomplished by using a custom taxonomy “region” and some custom actions and filters. For a theme to be regionalized, it needs to override some class members and call BaseTheme::regionalize().

Here’s part of our BaseTheme class:

abstract class BaseTheme implements Singleton {
    var $regions_map;
    var $regions_tax_map;
    var $post_types = array( 'post' );

    private static $instance = array();

    public static function get_instance() {
        $c = get_called_class();
        if ( !isset( self::$instance[$c] ) ) {
            self::$instance[$c] = new $c();
            self::$instance[$c]->init();
        }

        return self::$instance[$c];
    }

    protected function __construct() {}
    abstract protected function init();

    protected function regionalize() {
	add_filter( 'manage_posts_columns', array( $this, 'manage_columns' ) );
	add_action( 'manage_posts_custom_column', array( $this, 'manage_custom_column' ), 10, 2 );
	add_filter( 'posts_clauses', array( $this, 'clauses' ), 10, 2 );
	add_filter( 'manage_edit-post_sortable_columns',array( $this, 'sortables' ) );
	add_filter( 'pre_get_posts', array( $this, 'pre_posts' ) );
    }

    ...
}

Here is a theme extending it in its functions.php file:

if ( ! isset( $content_width ) )
    $content_width = 448;

class Theme_17Dots extends BaseTheme {
    function __construct() {
        global $dots_regions_tax_map, $dots_regions_map;

        $this->regions_map = $dots_regions_map;
        $this->regions_tax_map = $dots_regions_tax_map;

        parent::__construct();
    }

    function init() {
        $this->regionalize();

        add_action( 'init', array( $this, 'register' ) );
        add_action( 'after_setup_theme', array( $this, 'setup' ) );
        add_action( 'add_meta_boxes_post', array( $this, 'boxes' ) );
        add_action( 'save_post', array( $this, 'save' ), 10, 2 );
        add_filter( 'embed_oembed_html', '_feature_youtube_add_wmode' );
    }

    ....
}

“Must Use” Plugins

Let’s assume we stored our BaseTheme and BasePlugin classes in files called class-base-theme.php and class-base-plugin.php. The next question is: where should these  files go? Probably the best place for them to go is wp-content/mu-plugins. The WordPress bootstrap routine will load files in that directory automatically. Because our BaseTheme class is not immediately invoked within the file, the loading of the file just makes the class available for our theme child class in functions.php to extend.

Sunrise

sunrise.php can be viewed as a functions.php for your whole network. Aside from letting you switch the context of $current_blog and $current_site, you can also start adding actions and filters.

In the eMusic sunrise.php, I take this opportunity to load “site configs,” mainly to filter which plugins are going to load for which site, and which plugins will load for the entire network of sites (plugins that load for every site). To accomplish this, we need a “global” config and then a config for each site like so:

require_once( 'site-configs/global.php' );

if ( get_current_blog_id() > 1 ) {
    switch ( get_current_blog_id() ) {
    case 2:
        require_once( 'site-configs/bbpress.php' );
	break;

    case 3:
        require_once( 'site-configs/dots.php' );
	break;

     case 5:
        require_once( 'site-configs/support.php' );
	break;
    }

    add_filter( 'pre_option_template', function () {
	return 'dark';
    } );
} else {
    require_once( 'site-configs/emusic.php' );
}

Site Configs

In our global site config, we want to filter our active network plugins. As I have said in previous posts and presentations, relying on active plugins being correct in the database is dangerous and hard to keep in sync across environments.

We can filter active network plugins by filtering the active_sitewide_plugins site option:

add_filter( 'pre_site_option_active_sitewide_plugins', function () {
    return array(
        'batcache/batcache.php'                         => 1,
        'akismet/akismet.php'                           => 1,
        'avatar/avatar.php'                             => 1,
        'bundle/bundle.php'                             => 1,
        'cloud/cloud.php'                               => 1,
        'download/download.php'                         => 1,
        'emusic-notifications/emusic-notifications.php' => 1,
        'emusic-ratings/emusic-ratings.php'             => 1,
        'emusic-xml-rpc/emusic-xml-rpc.php'             => 1,
        'johnny-cache/johnny-cache.php'                 => 1,
        'like-buttons/like-buttons.php'                 => 1,
        'members/members.php'                           => 1,
        'minify/minify.php'                             => 1,
        'movies/movies.php'                             => 1,
        'shuffle/shuffle.php'                           => 1,
        'apc-admin/apc-admin.php'                       => 1,
        //'debug-bar/debug-bar.php'                     => 1
    );
} );

Filtering plugins in this manner allows to do 2 main things:

  • We can turn plugins on and off by commenting / uncommenting
  • We can visually see what our base set of plugins is for our network without going to the admin or the database

We have a lot of other things in site-configs/global.php, but I mainly want to demonstrate how we can load classes in an organized manner. In a site-specific config, we will filter what plugins are active for that site only.

For the main eMusic theme, we use these plugins:

add_filter( 'pre_option_active_plugins', function () {
    return array(
        'artist-images/artist-images.php',
        'catalog-comments/catalog-comments.php',
        'emusic-post-types/emusic-post-types.php',
        'discography.php',
        'emusic-radio/emusic-radio.php',
        'gravityforms/gravityforms.php',
        'super-ghetto/super-ghetto.php'
        //,'theme-check/theme-check.php'
    );
} );

This is just another example of us not repeating ourselves. When we think about everything in an object-oriented manner, everything can be cleaner and more organized.

HTTP Requests

Another area where we need to organize our code is around HTTP requests. We have “pages” in WordPress that aren’t populated via content from the database, but from a combination of Web Service calls based on request variables.

We use the following files:

// base class
/wp-content/mu-plugins/request.php

// extended by theme classes implementing the Factory pattern
// "dark" is a theme
/wp-content/requests/dark.php

// the theme class, where applicable,
// loads a request class when required by a certain page
/wp-content/requests/home.php
/wp-content/requests/album.php
/wp-content/requests/artist.php
... etc ...

As an example, an album page loads AlbumRequest:

class AlbumRequest extends RequestMap { ... }
class RequestMap extends API { ... }

// the API class implements CURL
class API {}

// CURL implements the cURL PHP extension, curl_multi(), etc
class CURL {}

Our organization of HTTP files is example of OO principles in action, and includes ways to organize our codebase in a way that is not specified by WordPress out of the box.

This is also an example of the self-documenting nature of object-oriented programming. Rather than show you each line of AlbumRequest, we can assume that it deals with requests specific to the Album page, inheriting functionality from RequestMap, which calls methods available in API, which at its core is implementing cURL.

Conclusion

Hopefully after reading all of this, you can clearly see some of the benefits of organizing your code in an object-oriented manner. The basics of OOP are outside the scope of this post but are essential for all developers to learn, especially those who work on teams whose members have varying degrees of skill.

Organized and readable code is essential, and the patterns available to us from the history of computer programming so far should be used to help get it there.

Couchbase

After reading an article on Tumblr’s architecture, I was intrigued by their affinity for Redis. I know lots of other projects use it, so I did some reading to find out if it was a suitable replacement for Memcached (it is marketed as “Memcached on Steroids”).

Redis is not an LRU Cache out of the box, it requires configuration for that. Redis is a NoSQL solution that provides a replication solution and can store key-values with expiration similar to Memcached.

A little about Membase:

Membase was developed by several leaders of the memcached project, who had founded a company, NorthScale, expressly to meet the need for an key-value database that enjoyed all the simplicity, speed, and scalability of memcached, but also provided the storage, persistence and querying capabilities of a database.

A then Membase became Couchbase:

As of February 8, 2011, the Membase project founders and Membase, Inc. announced a merger with CouchOne (a company with many of the principal players behind CouchDB) with an associated project merger. The merged project will be known as Couchbase. In January of 2012, Couchbase released a product building on Membase, known as Couchbase Server 1.8.

Most of the benchmarks I saw comparing Memcached and Redis: Memcached is still way faster yet “lacks the featureset of Redis.” Most of the benchmarks I saw for Couchbase used phrases like “sub-millisecond responses” / “twice as fast as Memcached.” I understand it’s cool to use sets and append / prepend to them in Redis, but other than those features, what makes it ultimately the better choice? I have concluded it just isn’t. If you like the tools it provides, awesome! If you want something that’s fast, has great tools, and is powerful all day long – why aren’t we all using Couchbase?

Major bug with WordPress + New Relic

If you haven’t seen New Relic yet, definitely take a look, it’s awesome.

That being said… if you are running New Relic monitoring on a huge production site, your logs might be spinning out of control without you knowing it, and I’ll explain why.

We use New Relic at eMusic to monitor our WordPress servers and many other application servers – Web Services, Databases etc – and we started noticing this in the logs like whoa: failed to delete New Relic auto-RUM

In New Relic’s FAQ section:

Beginning with release 2.6 of the PHP agent, the automatic Real User Monitoring (auto-RUM) feature is implemented using an output buffer. This has several advantages, but the most important one is that it is now completely accurate for all frameworks, not just Drupal and WordPress. The old mechanism was fraught with problems and highly sensitive to things like extra Drupal modules being installed, or customization of the header format. Using this new scheme, all of the problems go away. However, there is a down-side, but only for specific PHP code. This manifests itself as a PHP notice that PHP failed to delete buffer New Relic auto-RUM in…. If you do not have notices enabled, you may not see this and depending on how your code is written, you may enter an infite loop in your script, which will eventually time out, and simply render either an empty or a partial page.

To understand the reason for this error and how it can create an infinite loop in code that previously appeared to work, it is worth reading the PHP documentation on the ob_start() PHP function. Of special interest is the last optional parameter, which is a boolean value called erase that defaults to true. If you call ob_start() yourself and pass in a value of false for that argument, you will encounter the exact same warning and for the same reason. If that variable is set to false, it means that the buffer, once created, can not be destroyed with functions like ob_end_clean(), ob_get_clean(), ob_end_flush() etc. The reason is that PHP assumes that if a buffer is created with that flag that it modifies the buffer contents in such a way that the buffer cannot be arbitrarily stopped and deleted, and this is indeed the case with the auto-RUM buffer. Essentially, inside the agent code, we start an output buffer with that flag set to false, in order to prevent anyone from deleting that buffer. It should also be noted that New Relic is not the only extension that does this. The standard zlib extension that ships with PHP does the same thing, for the exact same reasons.

We have had several customers that were affected by this, and in all cases it was due to problematic code. Universally, they all had code similar to the following:

while (ob_get_level()) {
  ob_end_flush ();
}

The intent behind this code is to get rid of all output buffers that may exist prior to this code, ostensibly to create a buffer that the code has full control over. The problem with this is that it will create an infinite loop if you use New Relic, the zlib extension, or any buffer created with the erase parameter set to false. The reason is pretty simple. The call to ob_get_level() will eventually reach a point where it encounters a non-erasable buffer. That means the loop will never ever exit, because ob_get_level() will always return a value. To make matters worse, PHP tries to be helpful and spit out a notice informing you it couldn’t close whatever the top-most non-erasable buffer is. Since you are doing this in a loop, that message will be repeated for as long as the loop repeats itself, which could be infinitely.

So basically, you’re cool if you don’t try to flush all of the output buffers in a loop, because you will end up breaking New Relic’s buffer. Problematic as well if you are managing several of your own nested output buffers. But the problem might not be you, the problem is / could be WordPress.

Line 250 of wp-includes/default-filters.php:

add_action( 'shutdown', 'wp_ob_end_flush_all', 1 );

What does that code do?

/**
 * Flush all output buffers for PHP 5.2.
 *
 * Make sure all output buffers are flushed before our singletons our destroyed.
 *
 * @since 2.2.0
 */
function wp_ob_end_flush_all() {
	$levels = ob_get_level();
	for ($i=0; $i

So that’s not good. We found our culprit (if we were having the problem New Relic describes above). How to fix it?

I put this in wp-content/sunrise.php

<?php
remove_action( 'shutdown', 'wp_ob_end_flush_all', 1 );

function flush_no_new_relic() {
	$levels = ob_get_level();
	for ( $i = 0; $i < $levels - 1; $i++ )
		ob_end_flush();
}

add_action( 'shutdown', 'flush_no_new_relic', 1, 0 );

This preserves New Relic’s final output buffer. An esoteric error, but something to be aware of if you are monitoring WordPress with New Relic.