Update June 21, 2013
This article has graciously been translated into Serbo-Croatian by Anja Skrba. Read that version here.
It's no secret: concrete5 is a beefy application. We pack in a lot of dynamic features, have a very extendable code base, include large libraries from the Zend Framework and offer powerful permissions and design flexibility. In turn, sometimes site performance suffers, especially on cheaper web hosts.
Unfortunately, somewhere along the line, performance problems started to become more than just a nuisance and threatened to become a meme. We had to do something about them.
It All Started with Files
In concrete5 5.2.0, we introduced a completely new file manager. Files were now full-fledged objects with extensible metadata, permissions and sets. Retrieving information about a particular file object was much more expensive (in terms of database queries) than it used to be.
All of a sudden, certain add-ons like galleries which gathered large amounts of files at once were quite slow. They had to run hundreds of queries just to render a page.
Caching to the Rescue
In web applications, caching typically means retrieving an object from a faster place be it RAM, or a serialized object in a hard disk than reconstituting it entirely from the database. This lessens the load on the database and can dramatically improve performance. We added our own custom caching layer, which stored objects in the file system, and this problem went away. We liked this solution so much that we added more objects to this cache.
Enter Zend Cache
When we started working on localizing our add-ons (so they could be included in translation) we realized that it would make a lot of sense to use Zend Translate, which is a library contained with the Zend Framework. Zend Translate makes extensive use of Zend Cache. We took a look at Zend Cache, and realized that without a lot of work, we could remove our custom caching layer, replace it with Zend Cache, and automatically get access to different cache backends, like APC, Xcache, memcached and more. It was a no-brainer.
Not entirely. Zend Cache has always had its share of quirks. The biggest one? The default cache, which stores objects in the filesystem, gets slower and slower the more objects you place in it, due to some garbage collection routines that Zend Cache uses. If you disable garbage collection, the cache is fast, but it fills up with millions of files. This was exacerbated by the fact that, as we added features like extensible attributes, layouts and full page caching, we found ourselves storing more and more objects in that cache.
And the worst part? While we were trying to lessen database access, we just shuttled the problem over to the file system, which now had to deal with thousands of small files on every request. Yes, this problem goes away if you configure an alternate cache backend, but most concrete5 sites don't do that. On web hosts with slow disk I/O this was a killer: hitting a site with no cache entries or out of date cache entries could take a really long time. Oftentimes the next access would be faster, but not always; the filesystem could be a significant bottleneck on shared hosts, and it was clear that caching was hurting as often as it was helping.
Cache Bigger Objects: Enter 5.6
We recognized this, so we tried to cache larger and larger objects in fewer and fewer cache entries. The idea was that if we could still use the cache but access it less often we'd get the benefits of the cache without as much of the lookup slowness.
This reached a head in concrete5 5.6.0. Our completely rebuilt permissions system was designed from scratch to be extensible, powerful and able to handle anything. Unfortunately, it requires quite a bit more database access to work with. Rather than cache each permission lookup, we ran all permission lookups every time a page was requested, and cached that in the page object. This worked but performance was significantly degraded for sites that had caching off. And when I say significantly degraded, I mean significantly degraded. I think at one point I benchmarked around 700-800 queries running on a simple page when caching was disabled.
This was where we were in the Fall of 2012: concrete5 was more extensible than ever, but also slower than ever. Sites would fail due to slow disk I/O and the response was either "disable your cache" or "enable your cache." Disabling the cache would bloat the database queries by the hundreds, many of them duplicates. It was clear that caching was causing more problems than it was solving.
At this point, we had several layers of performance mechanisms:
- The in-memory cache: powered by a simple array, the in-memory cache (aka "CacheLocal", if you're curious) is a way of storing a requested object in memory, so that it wasn't retrieved from the file system if it had already been retrieved once in a given request.
- The basic cache: powered by Zend Cache, this was the object-based caching, usually performed by the file system, that I've described above.
- Block caching: introduced in 5.4.0, block caching is a powerful way of preserving all that is great about blocks while allowing developers to mark ways that their blocks can be cached, in order to preserve their performance on render.
- Full Page caching: also introduced in 5.4.0, full page caching is a way to store the entire contents of a page and render it ideally without accessing the database.
- Overrides Caching: introduced in 5.6.0, overrides caching was our way of saving the state of your overridden file system to a Config variable, so that we don't have to do as many file_exists() checks throughout our code.
While these were all noble attempts at solving the performance problems of a complex and flexible application, they were almost all failing:
#2 failed because too many items were being placed in the cache, the file system is slow on many web hosts, the item still has to be requested from the database and then written to the file system, and it just wasn't much faster to request from the file system than it was to get out of the database. Furthermore, by forcing more data to be written into the cache in order to create fewer entries, we actually did many more database queries to make #2 viable than we normally did. This killed performance especially when cache was disabled.
#3 was great in theory, but it used #2 as its backend, which meant all the problems that plagued #2 filtered down to #3. This was the same problem with #4 but in addition #4 suffered from firing far too late in the loading process. Practically, #4 never ran without still connecting to the database, which rendered its benefits minimal. Finally, #5 was a good idea in theory, but still required a database connection, and was confusing, because it couldn't be cleared by deleting files/cache/ directory in the filesystem its data was stored in a different way.
The Present: 5.6.1
It's no secret: we have great plans for concrete5 5.7.0. But these plans are somewhat contingent on concrete5 running everywhere, and running everywhere well. We need to work well on good servers, and work acceptably on even the slowest ones. We simply cannot afford and will not accept performance continuing to be such a problem.
So I set out to fix it. To do this, I decided to look at what worked, and throw out what didn't. The override cache worked and was a good idea, but storing its information in the database to get around the slowness of the cache was stupid. Full page caching was a great idea, and the options that triggered it were sensible. But relying on Zend Cache and firing late in the loading process? This was a bad idea. We had tried to make it so that controller actions and certain interactive items could still be used with full page caching. While this was neat in theory, it made full page caching much less effective, and most importantly just plain slower.
Block caching is a great idea. It occurred to me if we had had block output caching first, when the gallery problems had cropped up with the file block, we may never have implemented anything else.
What didn't work? The one-size-fits-all cache, using Zend Cache as a backend. At best, it still added uncertainty to concrete5 ("Something not working right? Clear the cache, I guess."). At worst, it degraded performance on first run and, for many sites, in perpetuity.
Let's be clear: Zend Cache is a fine piece of software. We will still continue to support its usage in our API and actively use it when interactive with Zend Framework components that use it (like Zend Translate). But shoving every object in there because we make too many database queries? That's like applying a bandaid to a cut with a pair of scissors.
You can currently download a beta version of concrete5 5.6.1 to see these changes in action. I would encourage you to do so, and to take it for a test drive on any web host. The cheaper the host the better.
What's been done? A lot.
We don't use the Cache anymore. It's still there and used for Zend Framework libraries, but rarely accessed anymore. I think the information about the dashboard picture of the day might use it but that's it.
We do use the in-memory cache. A lot. This is CacheLocal. The first time in a given request that a page, file, permissions record or other objects are requested, they are added to the local cache, and then subsequent requests for these items are retrieved from an array. This dramatically reduces database queries without incurring any disk I/O problems. Yes, you do occasionally run into odd cache-related bugs, and it could cause some problems for hosts with very restrictive memory settings, but these should both be rare, since CacheLocal doesn't persist across subsequent requests.
We do use block-level caching, for records and output, but we store this information in the database. We're already requesting data from these database tables, so adding another column to the mix and displaying its contents is trivial (and much faster than looking in the file system for Zend Cache entries.) All the same block caching settings that were valid pre-5.6.1 are still honored. The best part about block-level caching: we save block output and information when a block is saved, which means the data is already there for subsequent requests, leading to much less slowness on initial request for a given page.
We do use the Environment cache, but it's stored as a serialized object in files/cache/environment.cache. Clearing the cache will remove this file, as well as just deleting files/cache/ yourself.
We have updated our CSS cache to store files in files/cache/css/. This is used whenever a theme uses customizable stylesheets. Files are generated and referenced directly without going through concrete5.
We have completely rewritten the full page caching library. Full Page caching is completely modular and extensible. It fires very early in the request. If a site enables full page caching, and the pages are in the cache, a site can completely lose its database connection and it will still render as though nothing is wrong.
We have optimized and removed unnecessary queries. So many queries were structured in certain ways to appease the mighty Gods of cached objects, that when this was no longer a concern they can be rewritten. Page and File custom attributes are an example of this: we used to retrieve all attributes about a particular file or page when that object was retrieved, in order to cache as much in the object as possible. Now we only retrieve this information on demand.
The results speak for themselves. concrete5 5.6.1 is much more responsive on many web hosts. Yes, it still makes robust use of a database (non-logged-in users will see around 50-70 database queries rendering a standard starting point page that's not cached in the full page cache) but this is a massive decrease from 188.8.131.52. And its all being done without the cache adding extra disk I/O.
We are still actively exploring how we might speed up some core block types like Auto-Nav and the PageList class. This may require some creative, not-necessarily-object-oriented problem solving, but we're up to the challenge, armed with some of the lessons we've learned.
One Size Doesn't Fit All
The features we'd added for the sake of performance weren't all hindrances but their impact was lessened because we tried to funnel them all through the same approach (a file-based cache.) Now, although we have some slightly more idiosyncratic approaches to solving these problems, the results are much, much better.
Keep the Good Ditch the Bad
We've never been afraid to rebuild something in a better way.
Remember the Problem You're Trying to Solve
We should have realized we had a problem the moment we started trying to fix the cache. The cache isn't a feature of the software: the cache is means to an end (faster software!) Focusing on actually reducing database usage throughout the application in sensible ways rather than adding them and shuttling them into a cache once they've run is going to serve us far better.
Let us know what you think.