WordPress in Dev, QA, and Prod

if you work at a “corporation” and are using WordPress as a CMS, chances are you have multiple environments in which you do your development, testing, and then ultimately deploy to production. Syncing your environments (code, configs, database) can be a nightmare, unless you take some time to figure out just how similar your environments can be, and only focus on how they are different. I am going to share with you techniques we use at eMusic to seamlessly switch environments without having to do any manual intervention in the database, codebase, cache or configs (after the initial setup).

Database

You shouldn’t have to touch your database! You should embrace the idea that “production” is your authoritative database and only pull data DOWN, never push it UP. I know many tools exist to publish database rows from one environment’s database to another, but it is way easier to know you can always pull fresh data and “everything just works” rather than pushing data in real-time at runtime and praying for the best.

This will of course require some planning. Ok, I made a new “page” that uses rewrite and has some query vars that end up producing a very dynamic page, like an artist’s page for one of their albums. Before we roll this code to prod, I need to create the page in the WordPress admin on production so the URL will work and the page will be “live” when the code rolls.

Periodically, perhaps after each roll to production, a backup can be made of production data (an export) and “pulled down” to every other environment’s database (an import on each). I prefer to use Sequel Pro to manage our many databases and love its easy-to-export-and-import GUI tools. Sequel Pro will give you the option to compress your SQL export using Gzip compression. I have had some edge case issues where this has exposed some bad character encoding (someone pasted from Word without using the GUI button in TinyMCE…) in some of my tables, exporting SQL uncompressed has not.  Uncompressed SQL for a database with many millions of rows will produce a huge file. If you are confident in the cleanliness of your character encoding, try a compressed export for a much smaller file size (8-10x smaller).

Why this is a good method? 

  • Staging can always be replenished with fresh data, erasing any unnecessary dummy / test posts
  • What you do in dev, stays in dev, probably local to YOUR machine, so it can’t break anything
  • Prod remains prod, you can’t blow it up in real time with a bad import or a haywire push of rows / deltas

HEY WAIT A SECOND, ALL OF THE URLS IN THE DATABASE POINT TO PROD!!!!!

Output Buffering

Right, so you will need to somehow filter EVERYTHING that comes out of the database and replace production URLs with your current environment’s URLs. This can be an impossible task until you understand how output buffering works.

An output buffer will read all output that would normally get sent to the browser as bytes of HTML until you get its contents and choose to echo them out. Meaning, I can code my HTML document like I normally would, but the output buffer will swallow all of the output until I give the command for it to display. Sometimes this is called “flushing” the output buffer. Steve Souders even suggests calling the PHP method flush() right after your HTML </head> to send bytes to the user’s browser earlier and faster. Our technique kinda does the opposite of that, but what we get in return is far more valuable: a codebase that adapts to any environment, and one that rewrites the URLs on the fly for us.

This code can really go anywhere before you start echo’ing content or writing HTML, but the safest place to put it is in wp-config.php:

<?php 
define( 'DEV_HOST', $_SERVER['HTTP_HOST'] );
// single site install
define( 'PROD_HOST', 'www.emusic.com' );
// if you're using multisite
define( 'PROD_HOST', DOMAIN_CURRENT_SITE );

function your_callback_functions_name( $page ) {
    return str_replace( PROD_HOST, DEV_HOST, $page ); 
}

// pre-PHP 5.3
ob_start( 'your_callback_functions_name' );
// PHP 5.3
ob_start( function ( $page ) {
    return str_replace( PROD_HOST, DEV_HOST, $page ); 
} );

You can also include an array of URLs to replace with your DEV HOST. This is important if you have a production environment that has a host name for “wp-admin,” maybe a host name for XML-RPC servers, and a host name for your production site.

To set up a local host name for your site in Apache – you need to add an entry to your {PATH_TO_APACHE}/conf/extra/httpd-vhosts.conf file:

##
#
# Example Local Configuration (on a Mac)
#
##

<Directory /Users/staylor/Sites/>
Options Indexes MultiViews ExecCGI FollowSymLinks
AllowOverride All
Order allow,deny
Allow from all
</Directory>
<VirtualHost *:80>
    DocumentRoot "/Users/staylor/Sites/emusic/trunk"
    ServerName emusic.local
</VirtualHost>

You also need to add an entry in your /etc/hosts file:

127.0.0.1 emusic.local

AND AFTER THAT, you need to flush your local DNS cache to accept your new host:

#!/bin/bash
sudo {PATH_TO_APACHE_BIN}/apachectl restart
dscacheutil -flushcache

DEV_HOST requires a hard-coded host name (e.g. emusic.local) if $_SERVER['HTTP_HOST'] produces an IP address.

Here’s an example of filtering by searching for multiple URLs and replacing with the current host (the callback is called – passing the contents of the buffer – when the script and all output has reached its end. You can nest output buffers as well, more on this later):

// PHP 5.3
// these aren't real hosts
define( 'TOOLS_HOST', 'tools.emusic.com' );
define( 'XML_RPC_HOST', 'xml-rpc.emusic.com' );
define( 'PROD_HOST', 'www.emusic.com' );

ob_start( function ( $page ) {
    return str_replace( array( TOOLS_HOST, XML_RPC_HOST, PROD_HOST ), DEV_HOST, $page ); 
} );

Ok, I got this to work, but wait a second, we have DIFFERENT host names in staging for XML-RPC servers, etc, how do I tackle that….?

Machine-specific configs

We still use wp-config.php, but with a twist. We use MANDATORY machine configs. I’ll explain how. Inside of wp-config.php (right near the top):

define( 'DB_CONFIG_FILE', '/wp-config/hyperdb.php' );

if ( !is_file( DB_CONFIG_FILE ) ) {
    die( "You need a HyperDB config file in this machine's /wp-config folder" );
}

if ( is_file( '/wp-config/config.php' ) ) {
    require( '/wp-config/config.php' );
} else {
    die( 'You need a config file for your environment called config.php and located in /wp-config, and it needs to be owned by Apache.' );
}

Granted, you don’t have to use HyperDB. But, if you get any sort of serious traffic on your production site, you kinda HAVE to use HyperDB. HyperDB is outside the realm of this post, but just know it is a magical master-slave manager for a fancy production MySQL cluster.

The setup here is simple enough. Create a folder called wp-config (or whatever you would like) in the root directory of your machine and make sure Apache can read it.

Ok great, what goes in this /wp-config/config.php file…? Well, really anything you want, but probably these things:

// local constant for use in code
define( 'EMUSIC_CURRENT_HOST', $_SERVER['HTTP_HOST'] );

global $super_admins, $memcached_servers;
// define those here if you want, or not...

define( 'SUNRISE', 1 );

// your local database credentials
define( 'DB_NAME', 'my_database_name' );
define( 'DB_USER', 'root' );
define( 'DB_PASSWORD', 'mypassword' );
define( 'DB_HOST', '127.0.0.1' );
define( 'DB_CHARSET', 'utf8' );

// you may have environment specific S3 buckets, or not
// it's a good idea to keep these credentials in a file like this for security
// and ease of editing
define( 'AMAZON_S3_KEY', 'your_bucket_key' );
define( 'AMAZON_S3_SECRET', 'your_bucket_secret' );
define( 'AMAZON_S3_BUCKET', 'your_bucket' );

// multisite values
define( 'WP_ALLOW_MULTISITE', true );
define( 'MULTISITE', true );
define( 'SUBDOMAIN_INSTALL', false );
define( 'DOMAIN_CURRENT_SITE', 'www.emusic.com' );
$base = '/';
define( 'PATH_CURRENT_SITE', '/' );
define( 'SITE_ID_CURRENT_SITE', 1 );
define( 'BLOG_ID_CURRENT_SITE', 1 );

// maybe some hard-coded production values to filter
define( 'PRODUCTION_CURRENT_SITE', 'www.emusic.com' );
define( 'PRODUCTION_CURRENT_TOOLS', 'tools.emusic.com' );
define( 'XMLRPC_DOMAIN', 'xmlrpc.emusic.com' );

// environment-specific endpoints
define( 'XMLRPC_ENDPOINT', EMUSIC_CURRENT_HOST );
define( 'TOOLS_DOMAIN_FQDN', EMUSIC_CURRENT_HOST );

ob_start( function ( $page ) {
   // this might be overkill, but it makes sure you are 
   // getting rid of any "wrong" URL that made its way into
   // the database somehow
    return str_replace(
        array(
            XMLRPC_DOMAIN,
            TOOLS_DOMAIN_FQDN,
            DOMAIN_CURRENT_SITE,
            PRODUCTION_CURRENT_TOOLS 
        ),
        EMUSIC_CURRENT_HOST,
        $page
     );
} );

// custom WordPress configs:
// we use an svn:external for WordPress code
// in a different directory other than root
//
// we move our assets directory and exclude it from Subversion
// more on that later...
define( 'WP_CONTENT_URL',
    'http://' . DOMAIN_CURRENT_SITE . '/wp-content' );
define( 'WP_CONTENT_DIR',
    $_SERVER['DOCUMENT_ROOT'] . '/wp-content' );
define( 'EMUSIC_UPLOADS', 'assets' );

This is an example config that might look completely different in every enviroment’s machine(s):

  • Each environment might have specific overrides (a $super_admins array?)
  • You might specify 6 Memcached servers in one environment and zero in another
  • Your database credentials will undoubtedly change across environments

Sunrise

We use Multisite, so we also have to do some work in wp-content/sunrise.php. When using Multisite, WordPress allows you to get in there early by setting define( 'SUNRISE', 1 ) and completely changing what site and blog you are on by adding your own monkey business in wp-content/sunrise.php, if you so choose. You can also start adding filters and actions before the meat and potatoes of WordPress starts doing its thing. We are filtering output with the output buffer, but we have to filter PHP variables in code using WordPress filters:

$domain = EMUSIC_CURRENT_HOST;

// a bunch of code here is omitted that looks at domain
// and does some crazy stuff to switch between "blogs"
// without actually adding Wildcard DNS

// Multisite can require some unconventional
// code and admin wrangling to get things to work
// properly when you have a custom WP location,
// custom media location, AND a custom Multisite media
// location

// all you need to know:
// this code will produce $the_id (representing blog_id) and
// possibly alter $domain

// and then...

function get_environment_host( $url ) {
    global $domain;
    return str_replace(
        array(
            XMLRPC_DOMAIN,
            TOOLS_DOMAIN_FQDN,
            DOMAIN_CURRENT_SITE
        ),
        $domain,
        $url
    );
}

add_filter( 'home_url', 	'get_environment_host' );
add_filter( 'site_url', 	'get_environment_host' );
add_filter( 'network_home_url', 'get_environment_host' );
add_filter( 'network_site_url', 'get_environment_host' );
add_filter( 'network_admin_url','get_environment_host' );
add_filter( 'post_link', 	'get_environment_host' );

add_filter( 'pre_option_home', function ( $str ) use ( $domain ) {
    return 'http://' . $domain;
} );

add_filter( 'pre_option_siteurl', function ( $str ) use ( $domain ) {
    return 'http://' . $domain;
} );

// our custom image location for blogs / sites in our network
if ( $the_id > 1 ) {
    define( 'UPLOADBLOGSDIR', 0 );
    define( 'UPLOADS', 0 );
    define( 'BLOGUPLOADDIR',
        $_SERVER['DOCUMENT_ROOT'] . "/blogs.dir/{$the_id}/files/" );

    add_filter( 'pre_option_upload_path', function () use ( $the_id ) {
        return $_SERVER['DOCUMENT_ROOT'] . "/blogs.dir/{$the_id}/files/";
    } );

    add_filter( 'pre_option_upload_url_path', function () use ( $the_id, $domain ) {
        return 'http://' . $domain . "/blogs.dir/{$the_id}/files/";
    } );

// our custom image location for our main site
} else {
    add_filter( 'pre_option_upload_path', function ( $str ) {
        return $_SERVER['DOCUMENT_ROOT'] . '/' . EMUSIC_UPLOADS;
    } );

    add_filter( 'pre_option_upload_url_path', function ( $str ) {
        return 'http://' . EMUSIC_CURRENT_HOST  . '/' . EMUSIC_UPLOADS;
    } );
}

Ok wow, this is great, but what about images, how do I sync them…?

Images / S3

Trying to keep images sync’d in Subversion is a nightmare. Your production code probably isn’t a Subversion checkout (although WordPress.com is, they run trunk). It is more than likely a Subversion export that has been rsync’d across all of your many load-balanced servers / EC2 instances. If you don’t have a dedicated server for wp-admin, your images might not even end up on the same server – they could be spread across several. So that begs the following questions:

  • How is it humanly possible for servers to share images?
  • What happens if a server receives a request for an image it doesn’t have?
  • What happens when I pull the prod database to my local db but I don’t have any of those image paths in my file system?
  • How do I pull all of those images from production (each server) and somehow check them into Subversion… in real-time(!) ?

I am of the following opinion – you can’t! But this isn’t limited to Image assets. How do you serve a sitemap which is supposed to be a static file when you have 18 servers and the file is generated dynamically and saved locally? How do you minify JavaScript and then save them to flat files that can served by any server. My answer: you can’t, and shouldn’t!

So let’s start looking at solutions!

W3 Total Cache is the subject of intense debate across the WordPressosphere. Many say it does too much. When you are already using Batcache and Memcached, it sorta becomes overkill. But for me, there was one feature I always thought was invaluable. The CDN portion! So what does this CDN portion do?

The CDN code in W3TC gives you an admin interface to enter CDN (Amazon S3 or Cloudfront, Microsoft Azure, etc) credentials, and then magically when you upload a media file, it will upload that file to Amazon S3. Not only that, it will rewrite your image /media URLs to said remote location. So you keep working and uploading, it takes care of the rest. It’s magic!

To use this feature, I had to rip all kinds of code OUT of W3TC and make some changes here or there. One of the first things I knew I needed was a config that would work across any environments (much like our database works). I needed to accomplish the following things:

  • At no point in any environment do I need to have the images my local file system
  • Every environment’s media URLs should be seamlessly rewritten to their S3 counterpart
  • When I import the production database into any environment, media assets should appear as if they were always there.

This might sound super-sophisticated, but we’re only doing 2 crucial tasks:

  1. Adding an action that will upload the media to Amazon S3. W3TC already did all of that heavy-lifting. Thanks!
  2. Adding an output buffer to match media URLs against RegEx and replace them. W3TC already did all of that heavy-lifting. Thanks!

I had to make changes for this to work – altered the ways the configs work, made sure the GUI couldn’t override the config when messing around in the admin, added some constants that are defined in config.php for machine specific configuration. But ultimately, I took an existing technology and tweaked it to work to our advantage.

If you notice, I said that W3TC uses an output buffer – the output buffer starts after ours, meaning: we use nested output buffers. Their callback will run before ours, so the result of their output buffer callback will get passed to ours.

The Result

Our stuff “just works.” To get started developing with WordPress at eMusic, the following has to be done:

  • A production account on eMusic.com
  • Machine configs present in /wp-config
  • A copy of the production database
  • A Subversion checkout
  • Some entries in /etc/hosts for our many Web Services (from our Java universe)

That’s it. Here’s me talking about this and more at WordPress San Francisco 2011:

3 thoughts on “WordPress in Dev, QA, and Prod

Comments are closed.