RKG Logo 434-978-4300

I enjoy stumbling onto new things, and so changed my default FireFox homepage from Google’s personalized homepage to Yahoo’s redirect to a random URL (random.yahoo.com/bin/ryl) just to shake things up. After randomly hitting content spam pages (MFA) a few times when opening the browser in the morning, I began to wonder about their prevalence. After all, the web is a huge haystack, and those bogus pages must be occasional needles, right?

Curious, I tried 50 random pages from random.yahoo.com/bin/ryl. I’m assuming (big assumption) that Y! isn’t filtering all that much, save for language — that’s my guess because (a) all the results were in english, (b) three of the 50 were broken links, and (c) three of the 50 were porn sites.

Of the 50, four were clearly junk pages solely designed to generate search revenue. These four URLs were all concatenations of two common dictionary words which didn’t make much sense together, clearly suggesting they were purchased by a ‘bot. (The most amusing of the four was dochunter.com
, which can’t seem to decide if the page is about hunting moose, choosing a MD, or
– gasp — hunting doctors).

This survey is decidedly unscientific, is based on a tiny sample, and depends critically on the randomness of random.yahoo.com/bin/ryl, which isn’t known.

But still, 4 in 50 is 8% — that is amazingly high, in my opinion. The web is well over 11.5 billion pages (that estimate is over 18 months stale) — 8% of 11.5b is over 900 million junk pages.

Even if this estimate is off on the high side by an order of magnitude, that suggests at least 100 million bogus content pages siphoning value from advertisers to spammers. Scary.

If you like this post, consider subscribing to our RSS feed. You can also have new posts sent to you via email.


Related Posts

    No related posts.

Your Comment

Tags

RKG

Trackback

http://www.rimmkaufman.com/rkgblog/2006/07/03/content-spam-at-8/trackback/

Blogs Citing This Post

  1. Pingback: Quack, Quack: Made-For-AdSense Spam on January 25, 2008

Email Updates

Categories

Recent Comments

  • George Michie: Kevin, Marc, thanks for your comments. Help is coming, but not the solution. There are a number of instances when the CTR on the...
  • Marc Adelman: George, You have been an advocate of “the advanced control option” for years now. Depressing right YEARS! Eh…listen...
  • Kevin Hill: Is what they really need is a fourth match type. Here’s google’s help documentation on broad match: This is the default...
  • Kevin Micalizzi, Dimdim Web Conferencing: Jim (& George)- We still offer a free version of Dimdim. Just click Sign Up Now at the top of the...
  • Tomas: indeed, i can’t talk about it either… :)
  • Philip Price: Thank you for the RegHack, it worked for me, tho at first when i made the reg file with the information i copied from above i also...
  • George Michie: Sorry Jim, this post was written in 2007. Apparently some of those products are gone.
  • Jim: Hey, I checked two products like dimdim and cutepdf but none is free. What are you talking about free and open source?
  • George Michie: If they keep hearing the same message, and seeing evidence in the data to back it up, something will have to give. There is hope on...
  • Tomas: I’ve been having the same argument with Google for months now and in the end there does seem to be a feature in the algorithm that...
  • George Michie: Doesn’t have to be, it can be intra-adgroup as well.
  • Josh: George – I take it you’re referencing a scenario where your exact-match keywords are not listed as negative exact match keywords...
  • George Michie: Melissa, you’re right, it’s always happened to varying degrees, particularly since the advent of extended broad match....
  • Mel66: I don’t think this is a bug. It’s been happening for years. It *is* impossible to manage, and I can’t help but wonder if...
  • George Michie: Thanks Matt, Sometimes humor serves a purpose.

Blog Stats

  • Posts: 948
  • Words: 451,089
  • Comments: 2,875

Administration