The folks at Closed Loop Marketing have a comprehensive post on optimization techniques for your blog strategy, covering everything from feed optimization to blog structure to posting techniques. While these recommendations all provide excellent opportunities for enhancing the user experience and getting visibility in relevant communities and locations, would-be bloggers need to consider how search engines like Google and Yahoo will react when they reach the blog and how their blog content will be crawled, understood and indexed.
Remember that getting indexed is what initially will get you ranked for various keyword opportunities in search; generating traffic, visibility and (hopefully) readers and links. As you work to build your blog presence online (through quality posts, inbound links etc), the goal is to achieve better rankings for the more competitive terminology.
But if Google can’t accurately index your blog…
Areas of Potential Crawl Confusion
From a website optimization perspective, it’s important to understand that there are potential areas of duplication related to the traditional blog structure. Because of this duplication, search engines may have difficulty understanding the priority for defining what content should be displayed, as it relates to a potential keyword ranking.
Consider that this post will appear in the following locations on this website:
- The blog home page
- Applicable category pages
- Monthly (or yearly) archive pages
- The individual post’s page
If a search engine were to determine that this blog post was relevant for “Blog Optimization Tips”, which web address would be most appropriate to display? Instead of letting a search engine make the best guess (or just get confused and ignore it), you should explicitly tell them what to index and display.
Utilizing Robots Commands
Note: If you are unclear about the usage and definitions related to Robots Exclusion, you should read this guide from the Web Robots Pages website.
Your goal is to make certain that search engines index the valuable content in your blog (E.G. the individual blog posts), eliminate confusion when individual content is duplicated in categorical or time-specific archives, but still be able to follow and find applicable content wherever it need be found. Note that most of these recommendations have WordPress blog users in mind, but the same logic can be translated to other blogging software solutions as well.
This means you should create the following environment for search engine crawling and indexing:
- Prevent search engines from indexing Category Pages, but allow them to follow the links to specific blog posts
- Prevent search engines from indexing Archive Pages, but allow them to follow the links to specific blog posts
- Create enough of a unique presentation on your blog home page so that search engines realize that there is a definitive reason to index it (we usually see less problems arise with this, even without the recommendations for category/archive pages).
Two Options for Robots Commands
This is the preferred solution for handling indexing control with search engine robots (in the case of blog optimization). If you have access to add META robots tag information to the <HEAD> of category and archive pages, you will simply want to add the following detail:
<meta name=”robots” content=”noindex,follow” />
TIP: The All in One SEO Pack for WordPress can automatically generate this META robots tag for you and we’d recommend using this or a similar plugin anyways.
Using the robots.txt file directly
If you do not use WordPress and/or are not able to add the META information directly in the category/archive header information, consider updating your robots.txt file so that search engines understand NOT to index the applicable pages. You do this by adding the following syntax to your robots.txt file:
“https://komarketing.com/CATEGORY/” assumes whatever is the applicable category reference in your blog web address structure.
Important Notes About This Method: Unlike the META solution, you’re now completely blocking all indexing AND following opportunities for category archives and anything that incorporates domain.com/blog/YYYY/. This means you must also consider the following:
- Create a way (an individual post sitemap) for search engines to get to each post in addition to the ones that appear on the home page. (We’re currently using a simple post sitemap plugin but I really like Pronet Advertising sitemap idea as well)
- Ensure that the web addresses for your individual posts do not contain the /YYYY/ variable within them. For KoMarketing, our individual blog posts web addresses cut out both the yearly and monthly data.
If you do not implement solutions for these two things (or ensure that these two things are already in place), search engines will not index your individual blog posts (or at least all of your archived blog posts).
A Couple Additional Considerations
If you really want to ensure search engines avoid all duplication, you can also add the ‘rel=”nofollow”‘ syntax to all of your anchor specific links (particularly comments, trackback URL’s etc), but that may not be required.
We also recommend that any links to blog user logins or administrative functions be completely blocked from search engine indexing as well. This should simply be another entry in the “Disallow” section of the robots.txt file.
Making Your Home Page Unique
As I indicated above, it’s somewhat of a challenge to NOT get your blog home page indexed. But since it could happen and the home page will probably be the most important destination, you should take care to make certain it’s indexed. Here are some ways you can (and should) make your blog home page unique.
- Create an introductory paragraph at the top of the page.
- Differentiate your navigational structure so that certain items only appear on the home page (blog roll, favorites etc).
- Utilize post excerpts or short summaries for your blog posts, instead of the complete content of your post.
- As with any key landing page on a website, use unique TITLE tags and META information.
More Information and Considerations
There was an interesting article on Search Engine Journal related to fighting duplicate content on WordPress, which offers some more insight and ideas and if you want to get really into it, check out Andy Beard’s post on optimizing WordPress for competitive niches.
Lastly, it’s important to realize that there are other factors that can play into how successfully your blog gets indexed, including whether or not it’s attached to an existing domain (such as ours) and/or the rate of acquisition of high value inbound links. Just using robots exclusions offers no guarantees for high rankings, but it does provide an opportunity to better control how a search engine understands your blog and more control over what you want them to index.