Logo
Home Profile Portfolio Gallery SEO & HTML Links Sitemap Contact
 
  SRIKANTH RAJESH ILAPAKURTY
 

    SEO
    Lesson 011
    Search Engine Spam
   

Lesson 11 = Search Engine Spam

Definition

Excessive manipulation to influence search engine rankings, often for pages which contain little or no relevant content.

Information

Search engine spamming often gets confused with legitimate search engine optimization (SEO). While there is much gray area between the two extremes, in their most clear cut forms the terms are very different. Spamming involves getting a site more exposure than it deserves for its keywords, leading to unsatisfactory search experiences. Optimization involves getting a site the exposure it deserves on the most targeted keywords, leading to satisfactory search experiences.

Examples Include:
Irrelevancy - targeting keywords unrelated to the site/page.
Hidden Text - putting keywords where visitors will not see them, used to increase keyword count.
Hidden Links - putting links where visitors will not see them, used to increase link popularity.
Doorway Clutter - mass production of low-quality doorway pages, sometimes of the machine-generated variety.
When we say "spam", people usually think about unsolicited commercial emails only. But there are many other forms of spam on the Internet, and honestly, when I think of "spam", it's search engine spam first, and unsolicited commercial emails only second or third (there is also such thing as forum spam, and it's very annoying, too).

So, what is search engine spam?

In short, it's the common name for all techniques and methods used to intentionally deceive the search engines about the relevancy of a site/page to certain keywords or the authority of the site. It also refers to SEO methods that spoil the look and feel of websites and the user's experience in favour of the search engine relevancy and rankings.

Search engine spam can be on-page (techniques applied to the website itself) or off-page (unethical link building strategies).

The most often mentioned on-page spammy techniques are these:

  • Keyword stuffing. Keywords are included in the web copy to catch the engine's attention; to a human being they look meaningless and pointless; often their density within the copy reaches such levels that the text becomes impossible to read.
  • Invisible text. There are many ways to make text invisible. The most primitive one is to make the characters of the same colour (or almost the same) as the background, but in this case it will be visible if you click Ctrl+A to highlight the whole page. The text can also be placed outside the page using CSS, hidden using JavaScript, have the visibility attribute set to hidden, etc... In all cases it's deception and spam, unless there is an event programmed using JavaScript, which makes this text visible when a visitor performs certain action (like moving the mouse over the menu).
  • Invisible links. Hyperlinks hidden using one of the methods mentioned above, or invisible/auxiliary linked images. Hidden links usually point to the so-called doorway pages (see below), or link partners the site owner is not proud of.
  • Doorway pages/domains. Doorway pages are usually optimised for certain keywords, but have no value for real visitors, their content is often meaningless, or it is just the same page copied many times with only one main keyword replaced in the copy and in the URL. Doorway pages often get redirected to the real page using JavaScript-redirect or meta refresh redirect; alternatively, they may contain links the visitors should click to get to the real page. Doorway pages can be hosted within the same domain as the main site, but since it's dangerous, sometimes they are hosted on disposable doorway domains registered for just this purpose.
  • Cloaking. This is the technique allowing detection of search engine robots by their IP addresses and/or user-agents. Once a robot is detected, it receives a completely different page from what any other user-agent (e.g. browser) will receive.
  • Attribute keyword stuffing. Alt and title attributes of images and links get stuffed with unrelated keywords. Becomes especially obvious when an alt attribute of an invisible 1-pixel image - like spacer.gif - contains a long sequence of keywords. It's an example of useless spam, because the engines mostly look at alt attributes of linked images and practically ignore them if the image is not linked.
  • Improper use of headings. <h1> - <h6> tags are used within paragraphs, though they are supposed to mark headings. To blend with the rest of the paragraph, they get re-styled using CSS styles.
  • <Noframes>, <noscript> and <noembed> tag misuse. The page is done using Flash, frames or JavaScript (e.g. the document.write function). Then one of the mentioned tags is used to create an area in the HTML code of the page, stuffed with unrelated content, which is otherwise not displayed on the page. Sometimes a 1-pixel frame is used to justify the use of the <noframes> tags. Actually, a 1-pixel frame means something dodgy in most cases.
  • The second title tag. The most useless spam, as the engines ignore the second title tag for ranking purposes, but are known to have banned sites for doing it.
  • Comment stuffing. There is no need to stuff comments with keywords, because the engines ignore comments.


The most well known unethical off-page strategies are the following:

  • FFA (free for all) pages and link farms. FFAs are long pages full of uncategorised links, where everyone can submit a site (these links add zero authority to websites, but in some cases can hurt them). Link farms are large link schemes, in which all members link their sites to all other members, which is often achieved using an automated script.
  • Heavily cross-linked websites. Means websites linking to all other sites in the group from each page. Often used by owners of many different sites, but can be done to sites owned by different people also. In both cases it's a sure way to receive a severe ranking penalty.
  • Link-farm type directory networks. Actually link farms, but dressed as directories. The best way to avoid them is staying away from all directories requiring a reciprocal link for inclusion.
  • Automated link exchanges. This is SEO spam and email spam at once.
  • Automated or manual spam of blogs, forums, classifieds and guestbooks. Quite self-explanatory.

Using any of the above mentioned techniques can cause a severe penalty or a permanent ban of a website from the search engine indices! If you know of a website that is employing such techniques, don't link to this website, as "bad neighbourhoods" can hurt your website also!

Sixteen flavors of search engine spam

Thurow next presented a slide that contained a comprehensive list of sixteen tactics that are considered search engine spam. These techniques include:

  • Keywords unrelated to site
  • Redirects
  • Keyword stuffing
  • Mirror/duplicate content
  • Tiny Text
  • Doorway pages
  • Link Farms
  • Cloaking
  • Keyword stacking
  • Gibberish
  • Hidden text
  • Domain Spam
  • Hidden links
  • Mini/micro-sites
  • Page Swapping (bait &switch)
  • Typo spam and cyber squatting
Black Hat SEO vs. White Hat SEO - The Optimization Debate

There are right and wrong ways to conduct your SEO campaigns - good vs evil if you will! The terms 'black hat' and 'white hat' are thrown around with good measure in search marketing circles and you may have come across many references to them. But, do you know what each term encompasses, and what it means for your business? Read on and I'll explain...

Black Hat
Black Hat is the term for SEO practices that are unethical, underhanded and in direct violation of search engine guidelines. The Wikipedia reference for black hat defines it as "a person who uses their knowledge of vulnerabilities and exploits for private gain". Undertaking any Black Hat tactics generally results in some form of search engine penalization, including a drop in rank, fall in indexing or even exclusion. The following are some common black hat tricks.

Keyword Stuffing - As its name suggests, keyword stuffing involves packing keywords into either meta tags or website copy. Search engine algorithms can discern high levels of repetitive keywords or phrases and are unlikely to index such content.

Invisible Text - Placing lists of keywords or phrases on a page in the same color as the background, rendering them 'invisible' to the naked eye, violates many search engine guidelines and is inadvisable should you want to keep you website indexed!

Doorway Pages - These pages are built for search engine spiders, as opposed to visitors, and it's unlikely you'll ever actually see one! A doorway page is optimized soley for indexing. Users may click on a link to a doorway page (designed for crawlers) however, are quickly redirected to another page (designed for visitors).

Cloaking - Similar to doorway pages this technique involves showing one version of a webpage to a visitors, and offering another one for indexing by search engine spiders. A side script directs the crawler to index the page optimized only for search engines.

Scraping - This involves copying content (generally using some form of software) from other high ranking sites and passing it off as your own in an attempt to gain higher rankings yourself!

White Hat
As a general rule White Hat SEO is designed with humans, not search engines spiders, in mind. These practices are ethical and involve no attempts to deceive or trick search engines into getting higher ranking. Simply, the content that the search engines index is the content the users see when they visit the site!
To make sure you're following best practice check out the following guidelines:

Finally, here's a cool little diagram I found on silverdisc that sums up the fundamental differences between the 2 hats!

 

Black Hat

 

White Hat

Content and Links

Search Engines

 

Humans

Visibility to Humans

Hidden

 

Visible

Quality of Work

Hidden

 

Visible

Search Engines

Enemies

 

Nothing / Friends

Domains/Brands

Disposable

 

Cherished, Primary Domain

Site & Relevance

Apparently Improved

 

Actually Improved

Results

Yes, "Short" Term

 

Yes, "Long" Term

Ethical Techniques

No

 

Yes

Legal

No?

 

Yes?


Over time, the benefits of White Hat SEO are far greater than the short term perks of unethical Black Hat techniques. Play nice and you'll reap the rewards!

SEM Industry Standards
Perhaps it's a terrible cliche, but the only thing that experts agree is constant in the search engine marketing business is constant change.  
As 2002 drew to a close and SEO vendors converged in Dallas for the Search Engine Strategies conference, the predominant discussions focused on moving the industry to a more mature status, seeking to debunk common myths in search engine marketing (SEM) and searching for credibility among traditional media.  
Clouded by the murky waters of unethical behavior and the "black eye" that a few have given the entire industry, the "monetization" of the search industry has been one of the most difficult challenges faced by both search engine optimization (SEO) vendors and the search engines.
More than ever, SEOs are feeling the shift of search engine marketing to a "media buying industry" with cost-per-click feeds and keyword bidding shifting the focus to "checkbook SEO".  While "organic" SEOs fight this model to provide results at a reasonable cost, many concede that a fully integrated approach creates the best success model for search engine marketing.  
The advantage in moving towards the media buying model is greater accountability, including direct results with ROI tracking, but the SEM industry is still so immature, says Barbara Coll of WebMama, that the big companies with large advertising budgets are hesitant to spend money with SEO firms. Instead, many currently prefer to deal directly with the portal ad reps.  
Standards - The Bone of Contention
The push for SEM industry standards and SEO ethics guidelines continues, yet the likelihood of the industry coming to something of an agreement on this topic seems to be an impossible goal.  Complicating matters on this front are some of the actions taken by the search engines themselves, either by not clearly defining "the rules for spam" or setting guidelines that appear contradictory in nature.  
Exposing some of the double standards set forth by the major engines was Greg Boser of Web Guerrilla, who presented several instances of "favoritism" for sites that allegedly used spam tactics.  Boser contends that paying advertisers are given "somewhat of a free pass" or a chance to clean up the egregiously spammy SEO, and typically it is the website's ad sales representative who gives a heads up to the webmaster before the site gets removed from the natural index.  
To avoid the "relative risk" by pushing the limits of spam policies, Boser recommends spending a minimal budget advertising on the engines while implementing organic SEO tactics instead.  Doing so may provide you with an inside track in times of crisis - such as a "PR0" penalty or a missed update.  
Access to an ad rep may also help uncover potential code problems or other errors that prevent proper spidering of your site.  In extreme cases, such contacts can shed light on non-public issues, such as "back room deals" where inventory for particular keyword buys have been sold out for eternity, due to previous contracts signed.  
The Act of Intent
The lack of honesty on both sides (SEOs and the search engines) is clearly hampering the industry's growth, and the distrust on both sides stems from relevancy problems and issues surrounding trusted feeds.  For relevancy's sake, even ethical SEOs must sometimes rely on using tactics that push the limits, typically by way of work-arounds that do not offend the end-user, and only if there is a good reason to do so.  
Efforts such as having "hidden headings or text" underneath intensive graphics or flash are to make same content available to users, for their benefit, allowing the most relevant results to appear, despite issues with site design.  Another practical application brought up in discussion was the proper use of IP detection for the sake of attracting a robot and avoiding spidering issues with Session ID's or cookies.  
To date, most of the spidering engines have labeled IP detection as against the rules, but client pressure to solve such indexing issues via "black magic" or server side technology is on the rise. SEO vendors are asking the search engines to examine the nature of intent behind these practices more closely, rather than being labeled spam automatically.  
Frustrating for organic SEO campaigns are issues surrounding trusted feed results that are not clearly defined as paid results, which may eventually be challenged by the recommendation of the FTC to label such listings. For example, sites using Inktomi's trusted feed (at a set per click rate) are blended seamlessly into MSN search results. Some SEOs argue that there is most definitely a relevancy issue with trusted feed pages that do not always point to the right content to match the search, and furthermore, are ranked ahead of the less expensive, paid inclusion pages in an effort to better monetize search.
On the flip side, Mikkel deMib Svendsen argues that paid results can be more relevant than "natural spam," but the quality of editorial integrity needs to be evaluated and recognized separately.  
Accusations also flew around about "monetization targeting," the practice in which sites successfully implementing organic SEO strategies are cold called by search engine sales reps, looking to upsell paid programs. In a few cases, claims Mr. Boser, this has occurred soon after well-placed pages "mysteriously" disappeared from the natural index.  
The Future of the SEM Industry
While Google made the most recent attempt at setting standards for ethical SEO practices, there is significant pushback from SEO vendors who feel the statements made within the document are a step in the right direction, but contradictory to the point of buyer confusion.  
Danny Sullivan says it still does not provide the consumer with the information needed to make a good decision on choosing an SEO consultant. Just as previous attempts by other outfits had caused confusion, Google's SEO policy needs holes filled in and cannot be viewed as the industry standard.  
Among the top points of concern to SEOs are the "money-back guarantee" clause and the issue of reporting spam when there are no definitive methods of investigating technical issues or clearing up spam penalties. To solve spam disputes or technical questions, SEOs loudly voiced the desire for Google or other search engines to launch a paid support system to offer (even partial) answers to problems faced by SEOs and webmasters.  
Daniel Dulitz of Google said that the problem with implementing such a system is managing the SEO/customer relationship -- there still would be no true way of getting direct answers, though he concedes that the system could work to alleviate the heavy email load Google currently faces.  
The potential benefit of creating such a subscription system could lend credibility to the search engine marketing industry as a whole, by at least giving website owners some assurance that an SEO has the resources and knowledge available to maximize their visibility in Google without breaking the rules and causing harm to their businesses.
As it stands now, the are no rewards for "being the good guy" in SEM, since SEOs cannot answer client problems with complete accuracy when unexpected events occur and there are no formal explanations put forth by the engines. Such issues put the credibility of the entire industry at risk.  Of course, it's a slippery slope for search engines to follow, because they do not want to take on the direct liability of acknowledging which SEO firms are good or bad.  
Traditional media agencies are just beginning to weave themselves into the SEM industry because large advertising clients are starting to demand greater visibility from search engines.  Tony Wright of Weber Shandwick confirms that media agencies still "don't get SEM" and will continue to spend money without knowing where it's going, but predicts the agency switch or the buying of smaller SEO agencies with good clients is coming soon.
Meanwhile, SEO vendors continue to look toward offering large and small companies, as well as media agencies, value added services that include initial consulting services and detailed take-away strategies, education process on best practices of SEO, site architecture and technical constraints, and training of appropriate staff to manage PPC, SEO campaigns in-house.  
- Search Engine Guidelines

Guidelines for Search Engine Optimizers
http://www.google.com/webmasters/seo_html.html
While Google does not have relationships with any SEOs and does not offer recommendations, it does offer a few tips that may help you distinguish between an SEO that will improve your site and one that will only improve your chances of being dropped from search engine results altogether.
How to Report Spam
http://www.google.com/contact/spamreport.html

How to Avoid Spamming the Search Engines

You should be aware of what constitutes spamming so as to avoid trouble with the search engines. For example, if you have a page with a white background, and you have a table that has a blue background and white text in it, you are actually spamming the Infoseek engine without even knowing it! Infoseek will see white text and see a white page background, concluding that your background color and your page color are the same so you are spamming! It will not be able to tell that the white text is actually within a blue table and is perfectly legible. It is silly, but that will cause that page to be dropped off the index. You can get it back on by changing the text color in the table to, say, a light gray and resubmitting the page to Infoseek. See what a difference that makes? Yet you had no idea that your page was considered spam! Generally, it is very easy to know what not to do so as to avoid being labeled a spammer and having your pages or your site penalized. By following a few simple rules, you can safely improve your search engine rankings without unknowingly spamming the engines and getting penalized for it.

What constitutes spam?

Some techniques are clearly considered as an attempt to spam the engines. Where possible, you should avoid these:

  • Keyword stuffing. This is the repeated use of a word to increase its frequency on a page. Search engines now have the ability to analyze a page and determine whether the frequency is above a "normal" level in proportion to the rest of the words in the document.
  • Invisible text. Some Webmasters stuff keywords at the bottom of a page and make their text color the same as that of the page background. This is also detectable by the engines.
  • Tiny text. Same as invisible text but with tiny, illegible text.
  • Page redirects. Some engines, especially Infoseek, do not like pages that take the user to another page without his or her intervention, e.g. using META refresh tags, cgi scripts, Java, JavaScript, or server side techniques.
  • Meta tag stuffing. Do not repeat your keywords in the Meta tags more than once, and do not use keywords that are unrelated to your site's content.
  • Do not create too many doorways with very similar keywords.
  • Do not submit the same page more than once on the same day to the same search engine.
  • Do not submit virtually identical pages, i.e. do not simply duplicate a Web page, give the copies different file names, and submit them all. That will be interpreted as an attempt to flood the engine.
  • Code swapping. Do not optimize a page for top ranking, then swap another page in its place once a top ranking is achieved.
  • Do not submit doorways to submission directories like Yahoo!
  • Do not submit more than the allowed number of pages per engine per day or week. Each engine has a limit on how many pages you can manually submit to it using its online forms. Currently these are the limits: AltaVista 1-10 pages per day; HotBot 50 pages per day; Excite 25 pages per week; Infoseek 50 pages per day but unlimited when using e-mail submissions. Please note that this is not the total number of pages that can be indexed, it is just the total number that can be submitted. If you can only submit 25 pages to Excite, for example, and you have a 1000 page site, that's no problem. The search engine will come crawling your site and index all pages, including those that you did not submit.
Gray Areas

There are certain practices that can be considered spam by the search engine when they are actually just part of honest Web site design. For example, Infoseek does not index any page with a fast page refresh. Yet, refresh tags are commonly used by Web site designers to produce visual effects or to take people to a new location of a page that has been moved. Also, some engines look at the text color and background color and if they match, that page is considered spam. But you could have a page with a white background and a black table somewhere with white text in it. Although perfectly legible and legitimate, that page will be ignored by some engines. Another example is that Infoseek advises against (but does not seem to drop from the index) having many pages with links to one page. Even though this is meant to discourage spammers, it also places many legitimate Webmasters in the spam region (almost anyone with a large Web site or a Web site with an online forum always has their pages linking back to the home page). These are just a few examples of gray areas in this business. Fortunately, because the search engine people know that they exist, they will not penalize your entire site just because of them.

What are the penalties for spamdexing?

There is an inappropriate amount of fear over the penalties of spamming. Many Webmasters fear that they may spam the engines without their knowledge and then have their entire site banned from the engines forever. That just doesn't happen that easily! The people who run the search engines know that you can be a perfectly legitimate and honest Web site owner who, because of the nature of your Web site, has pages that appear to be spam to the engine. They know that their search engines are not smart enough to know exactly who is spamming and who happens to be in the spam zone by mistake. So they do not generally ban your entire site from their search engine just because some of your pages look like spam. They only penalize the rankings of the offending pages. Any non-offending page is not penalized. Only in the most extreme cases, where you aggressively spam them and go against the recommendations above, flooding their engine with spam pages, will they ban your entire site. Some engines, like HotBot, do not even have a lifetime ban policy on spammers. As long as you are not an intentional and aggressive spammer, you should not worry about your entire site being penalized or banned from the engines. Only the offending pages will have their ranking penalized.

Is there room for responsible search engine positioning?

Yes! Definitely! In fact, the search engines do not discourage responsible search engine positioning. Responsible search engine position is good for everybody - it helps the users find the sites they are looking for, it helps the engines do a better job of delivering relevant results, and it gets you the traffic you want!

As a Webmaster, you should not be too afraid that you are spamming the search engines in your quest for higher search engine rankings. No question about it, though, spam is something that every Webmaster should understand thoroughly. Fortunately, it is easy to understand it. So learn the rules, re-examine your Web pages, resubmit to the engines, then create gateway pages to get better ranking on the engines, using the rules above. If you need any more information on search engine spamming and search engine positioning, see http://www.searchpositioning.com. I wish you the best of fortune in your Web promotional efforts!