GenuineVC David Beisel's Perspective on Digital Change

May 25, 2005

We all know about search engine spam. Wikipedia defines it using the coined term “spamdexing” as,

“the practice of deliberately and dishonestly modifying HTML pages to increase the chance of them being placed close to the beginning of search engine results, or to influence the category to which the page is assigned in a dishonest manner. Many designers of web pages try to get a good ranking in search engines and design their pages accordingly. Spamdexing refers exclusively to practices that are dishonest and mislead search and indexing programs to give a page a ranking it does not deserve.”

To combat the numerous techniques used to spam search sties (like keyword stuffing, invisible text, cloaking, etc.), the engines have deployed a variety of algorithms to determine ranking relevancy. (In a post earlier this month I talked about the fine line between search engine spam and content, arguing that different parties would disagree as to what is spam and what is actual content). Thus far, the major search engines have done a fairly (but not perfectly) good job at combating this problem, but it’s obviously a continuous battle.

Now enters the world of RSS and the Incremental Web. No longer is the position of a search result a function of its relevancy, but it is also a function of its timeliness. Consequently, search engine spammers have a new trick to play with.

For example, I’ve noticed as soon as I ping Technorati with a new blog post, a search for many of the keywords in that entry places my blog in the first result. As the day progresses, the blog entry moves down the list of results. I believe that spammers will begin to realize this “opportunity” to instantly have their pages placed at the top of the list results and exploit it.

Eventually the Feedsters and Technoratis of the world will determine algorithmic techniques to combat this problem, but I predict that there could be a bumpy transition period as spammers realize the power that’s here.

UPDATE: Since writing this entry, I’ve come across two great posts (here and here) from the hyku blog about spam moving to RSS and another with PubSub’s thoughts on the issue.

  • http://feedster.com Scott Rafer

    Feedster, PubSub, and Technorati are all already actively excluding ping-server driven spam from our systems. From my latest blog post at http://corp.feedster.com/blog/rafer:

    … blogspot spam is back. Somehow their captchas have been defeated on what appears to be an automated basis. We got hit with 10,000 blogspot gambling spam feeds on Friday, published by some lovely person who wanted to exploit the Belmont Stakes.

About Me

  • avatar
  • I am a cofounder and Partner at NextView Ventures, a dedicated seed-stage venture capital firm making investments in internet-enabled startups. Read More »

Coordinates

Subscribe

Rob Cho Go




 RobGo.org

NextView Twitter Stream

51015
  • Lee Hower
     - 45 minutes ago
    nice take by @micahjay1 in @TechCrunch on the value of both Thoroughbreds & Unicorns in startup ecosystem http://t.co/JBBccKSTXr
  • Rob Go
     - 15 hours ago
    RT @joelle_emerson: "The urge to create is equally strong in all children. Boys and girls." MT @henrywarren: Lego nailed it in the 70s http…
  • Rob Go
     - 1 day ago
    really looking forward to my Mom's visit this Monday. It will be her first time in the US for Thanksgiving, I think.
  • David Beisel
     - 2 days ago
    BuzzFeed’s latest traffic trick: The ‘social URL’ http://t.co/FSShCnQSTv via @markjosephson

Search