Protecting yourself from SEO Censorship

I got contacted by a journalist (who I happen to be a regular reader of) yesterday. He was following up on the investigation I did into CNN comment spam, and some similar stories. I thought I'd post an answer to one of his questions:

What are these people -- if not the companies themselves -- doing to manipulate search rankings, and what can bloggers and Google do to fix things?

The SEO tactic which poses a danger to bloggers is perhaps most simply put as "reverse search engine optimization". Google has a set of filters which detect and punish pages that use certain outdated SEO tactics. In the case of the CNN spams, we observe a tactic called keyword stuffing. Put simply, keyword stuffing is flooding a page with repeated search terms, (i.e. CNN CNN CNN CNN CNN CNN CNN). At one time, this tactic actually worked; but now google has a set of filters whicheasily detect this practice, and in response, either punish, or remove the page from the index. In the case of spams that target posts that are discussing a company, the possible damage of stuffed keywords is greatly intensified. But to understand that, you must first understand how google assigns weight to the content in the page. For convenience, we'll use the page of mine that was attacked "CNN: Television's Great Orifice"

In every search engine, the single most important determinant of where your page is placed is the text in your <title></title> tags. For example <title>CNN: Television's Great Orifice</title>. Now, in most blogging software, the page title is also outputted into <h1></h1> or <h2></h2> tags. These header tags are probably the second most important determinant of where your page is indexed. Third most important is the content. My essay has about 7 mentions of "CNN". All together, this is why my essay is seen as relevant to "CNN" by google.

But, when a spammer comes in and leaves a comment that repeats "CNN" 100 times, there is no way for google to differentiate between the repeated search terms as spam, or me trying to boost my own search engine ratings by way of dishonest techniques. If someone dropped by and left a comment that repeated "Texas Holdem" over and over again, it would be a different story. The repeated terms are not already in my pagetitles, headers, and content. But in this case, google has my pagetitles, headers, and content all containing "CNN"; so it looks very bad to their filters. (does that make sense?)

Keyword stuffing is perhaps the easiest way to pull off a quick censoring of a post. Unlike, cloaking (setting font-color to white, and repeating a term over and over again so that is visible to search engines but not the human eye), you can't protect yourself by limiting the HTML tags that a commenter can use (i.e. do not let commenters use any style tags). The only way you can protect yourself from stuffed keywords is to have some way of knowing when someone comments on any post of yours.

I don't ever know if we'll ever see this tactic become widely used. It only works on pages that allow comments; and its very easy to catch. However, there is another tactic that I'm far more worried about. That is: using link farms to censor a page.

A link farm is a page that is created for the one purpose: to link to page, and in turn raise its page rank. But google has wised up to link farms, and now actively punishes pages that have participate in link farms. Many of these pages have been removed from the index all together. Theoretically, it would be possible for a company to set up link farms to attack certain pages, and delist them in far more silent and effective way than is possible with keyword stuffing. The linkfarms themselves would be delisted, so it would be very difficult to track this practice. I haven't yet seen any evidence of this tactic being used. But my guess is if its not already happening, its right around the corner.

The main problem for google is that the very filters that maintain quality in their search results can easily be taken advantage of by an SEO expert gone bad. When they designed the filters, they did not seem to take into account what someone would have to gain from getting a page delisted. Then again, I haven't the foggiest idea of how google could differentiate between dishonest SEO that is by or against the owner of a page. It's a forgone conclusion that these filters are an overall good thing for the web; even if they can be taken advantage of.

As for what bloggers can do in the meantime, I'm afraid I don't have any easy steps that will protect them. Rather, my advice is to be vigilant, keep an eye out for fishy comments, keep track of your google traffic; if it begins to fall do some investigation as to why. If you can't figure it out, drop me comment at my blog, and I'll happily investigate it for you.

I don't see this being a problem that is going to be solved through educating the masses. Most bloggers aren't going to take the time to have total awareness of what is happening to their comment sections, much less their google rankings. Moreover, half of the concept I've described is theoretical. The best strategy I've come up with so far is to aggressively pursue any occurrences of these dirty SEO tactics, and hopefully, expose the companies that perpetrated them. The goal is to make it a practice which is widely known to carry risks that outweigh its benefits.

The Internet is free, but like a democracy, a free Internet requires vigalince. My personal view is that protecting our right to be listed high on a google search is as important, to virtual world, as protecting our freedom of speech is to the physical world. To be listed on some google search is to be heard, to be delisted is to be silenced. When you've been silenced, you've been censored. Its as simple as that.