SEO Tip: Force concrete5 Pages to Display at One URL
I use Disqus web comments on this site. They're great: they handle threading, login from multiple sources, monitoring and post by email, user avatars, and some other things that I'm glad I don't have to deal with. Each page contains a snippet of JavaScript that, when loaded, sends the page's URL to the Disqus servers, and retrieves any comments posted against that URL.
I'd had no problems with this setup until just the other day, when I received notification of a new comment to an article, and I could not for the life of me locate the comment on the page. The article had fourteen comments, and yet only thirteen displayed, the latest from a month or two ago. I was apparently dealing with a phantom comment. Frustrated, I logged into the main disqus.com website, where I was able to locate the comment, alongside all the others. According to Disqus, it was approved and active on the page. But where was it?!
The clue came when I looked at the story to which the comment was attached. All fourteen comments were attached to "andrewembler.com :: Optimizing your concrete5 Website for the iPhone" – but the thirteen that displayed linked to that story here:
http://andrewembler.com/concrete5/optimizing-your-concrete5-website-for-the-iphone/
While the orphaned comment linked to it here:
http://andrewembler.com/index.php?cID=87
Sure enough, when I visited that URL, the same article rendered, but with one comment – the new comment – instead of the other thirteen. Since Disqus provides no move tools, this comment is stuck on this page. But I can make sure that this doesn't happen again.
Background: the cID
Every page of content added to a concrete5 website has one unique ID attached to it. This is referred to as the collection ID, and is frequently abbreviated in functions and URLs as "cID." Why "collection" ID, rather than "page" ID? Originally, when concrete5 first came into being in late 2003, its pages were referred to as Collections – since they were thought of as collections of blocks. As concrete5 matured and started using attributes, themes, page types and more complex objects, we thought just calling them pages – which is, after, what they are – was less of a barrier to entry. The "cID" parameter is still actively used, however, and it's this cID which makes its unsightly appearance in the URL above.
Basically, my concrete5 iPhone article can be accessed at two URLs. The cID URL, which loads the article based on its ID in the database, and the full SEO-friendly URL. (Note: you may have more than one URL as well, but since concrete5 5.3 they redirect to the canonical URL.) Why? In concrete5, pages can be created well before their actual canonical URL is chosen. Most concrete5 navigation and linking tools will link directly to the canonical URL, but somehow, if a user stumbles on the cID URL, the page will still render without redirection. Usually this isn't a problem, but as we've seen in my example, it can lead to some very undesirable situations. I'm going to show you how to fix that.
The Task
I want to ensure that a comment can never be mis-posted again. I've already fixed as many areas in the concrete5 core that link to the cID URL of pages, but there's still no telling what people might accidentally link to or stumble upon. I need something that will redirect them if they land on a page via its cID URL.
This can't be too greedy, however: the cID-powered URL is used very, very frequently while concrete5 sites are in edit mode. What I really want is this: if a user comes to any page on my site by way of the cID URL, and they are not logged in (i.e - they're not editing a page), they should be redirected to that page's canonical URL via a 301 redirect.
Step 1: create config/site_post.php
There are a number of ways to accomplish our goal. I've already mentioned concrete5's events system in previous articles, so here I'm going to take an even simpler approach, and introduce you to config/site_post.php. This is an optional file that site owners may create in their local config/ directory that will automatically be loaded after all configuration files and database connections have been made. This lets us hook into concrete5's loading process relatively early. Note: theme information and some other core information about a request will not be available this early, but for our purposes it should work nicely.
Step 2: Add Code
Once you have an empty file at config/site_post.php, add this code to it:
<?php
$req = Request::get();
if ($req->getRequestCollectionID() > 1 && $req->getRequestPath() == ''
&& $_SERVER['REQUEST_METHOD'] != 'POST') {
// This is a request that is directly for the cID, rather than the path
$u = new User();
// If the user is logged in we do NOT redirect
if (!$u->isRegistered()) {
// Get the page object for the current cID
$c = Page::getByID($req->getRequestCollectionID());
if (!$c->isError()) {
$nav = Loader::helper('navigation');
header ('HTTP/1.1 301 Moved Permanently');
header('Location: ' . $nav->getLinkToCollection($c, true));
exit;
}
}
}
Let's walk through exactly what's happening here:
Lines 1-4: First, we grab the concrete5 Request object. Then, we check that object to see if the current request is for a cID, and the current request has a null path. We ensure that the current request's cID is greater than one, so that our home page (which is always a cID request, since it has no path) will be ignored in this check. Update: we also check to make sure we're not doing a POST (see note in the comment).
Lines 5-8: If this is the case, we want to proceed with our redirection. We then instantiate the User object, and check to see that the current User object is not logged in. If the user is not logged in, we proceed with our redirection routine.
Lines 9-11: Now, we grab the page object for the current requested page ID, and check to ensure that there's no error associated with this page object. Additionally, we make sure that the page object does, in fact, have a valid path (since if it didn't, we'd get stuck in an infinite redirect loop.) If that's the case, then...
Lines 12-15: We load up our navigation helper, issue a permanently moved header (for the benefits of any search engines following these links) and we redirect the URL to the canonical URL for the page.
Step 3: Profit!
Try it for yourself:
http://andrewembler.com/index.php?cID=87
This is the URL that was causing so many problems before. Hopefully the orphaned comments are a thing of the past.
Yes, we ought to include something like this in the core – but as you've seen here, it's not a perfect solution, and could cause problems for certain sites. For example, this wouldn't work on sites that include community functionality that use logged-in users frequently. But on a blog or a site where the only logged-users are those who edit the site, it should be a nice, useful snippet of code to keep you, your readers and your search engine robots happy.