Posted on 08-11-2008
Filed Under (Entrepreneur, Hosting, Moneymaking) by tycho

When you are a hardworking individual, paying for your own servers, tweaking your sites etc to make ends meet, it is really frustrating seeing people abuse your hard work.

Although you take basic measures to prevent hackers from misusing your site in a lot of ways, they always find new and faster ways to get on top. The easier your site is to use for real users, the easier it is for spammers to get in.

What drives these people? Do they tell their parents; ‘look ma, pa, I have a cool job, I f*ck people over and make a few cents with it for some unethical person on the other side of the world!’? Or do they keep it quiet and no-one, not even their spouses know about their profession?

Spammers are clever people, sometimes. I should rephrase this. Spammers are stupid people, almost always, but sometimes they are really clever. The not so clever ones follow simple patterns;

  • somehow send mail with a personalized message in it anonymously
  • putting links on some PR-higher-than-0 site anonymously

They only find sites that have openings for these kinds of things without ‘annoying’ things like captcha’s. Or hackable captcha’s for the people who have contacts (most captcha’s are hacked anyway so if you can get to software for reading them, it is just as simple as if there was no captcha at all…). All very sad for hardworking people who want to make nice, easy sites for people who appreciate that.

The best way of preventing spammers from posting stuff to your site / application is just making it paid. Make people pay $1 for a lifetime subscription to post on your site.

Unfortunately that is not really possible for most people so they need to find other ways to fix the problem. A well known site called cubestat.com got into problems a week ago when many spammers got onto it to post their crap and they asked me to find some kind of strategy to resolve it.

What was the problem? The site has become very popular for fast indexing in Google and thus SEO. Getting your site on Cubestat is one of those steps you have to take when you start a new site. And, because it was completely open and easy to use, it was easy to get into Google and to get a good PR rather fast.  So people started to post hundreds and even thousands of subdomains to it. The recipe in the ‘blackhat community’ was to get some domain, any domain, make a lot of subdomains with names related to your subject and content related to your subject and auto-post them to Cubestat. Simple software was written for auto posting it etc.

For the owners of Cubestat, ease of use is very very important, so putting some kind of unreadable captcha in front is not really an option, nor is paying any kind of amount before posting / requesting a URL.

It took me 2 weeks to come up with a solution, but after reading this article I was sure; I shouldn’t be too clever, but just use the data itself to come up with a solution for this dilemma. We have several factors; we have some data about the URL poster, we some data about the site, we have some data about the context (date, time, subject etc) the site was posted in. This should be enough to stop the spammers. Because this kind of spamming is (much) more limited than mail spamming, the solution can be (much) better than something like Bayesian statistics.

Ofcourse I already did something similar for a huge free hosting company and they currently get 2 or 3 complaints per day instead of around 600, so I had some code lying around to attack the problem from a few angles.

I still hope spammers get caught and get their punishment, but after some small personal victories, I think analytical thinking can and eventually will stop most of them in their tracks. There are always better and more clever hackers/crackers and I am sure they look right through me, but the monkeys that are hired to these kinds of jobs are beatable IMHO.

Reblog this post [with Zemanta]
    Read More   
Post a Comment

You must be logged in to post a comment.