TurnitinBot and Why You Should Block It

If you've been to university recently you'll be familiar with Turnitin. It's a service that supposedly detects plagiarism in students' submissions. Students submit their paper to Turnitin, and Turnitin generates a report telling you it's x% plagiarised and contains sentences, paragraphs, or even chapters that look suspiciously similar to other papers in its database. It's also woefully stupid; when I did my Master's thesis (on ragdoll physics), Turnitin decided I'd 2% plagiarised from some biochemistry papers, and completely failed to detect any similiarity with other papers on ragdoll physics.

It's now trawling the internet, similarly to Google, to build up its database.

The reason this is bad is as follows:

Turnitin is a for-profit organisation and generates money from indexing your stuff, with no permission given by yourself, and at no possible benefit to yourself. It doesn't reimburse you, it doesn't cite you, it just presents your work as being something it has an inalienable right to use for whatever it wishes. Whilst it is supposedly against plagiarism, it has no qualms violating your intellectual property and then making money off it.

Here's how you can block it:

robots.txt:

12
User-agent: TurnitinBot
Disallow: /

.htaccess:

12
RewriteCond %{HTTP_USER_AGENT} TurnitinBot [NC]
RewriteRule .* - [F]
Short link: http://blog.asgaard.co.uk/s/60
15:36:09 13th October 2012 | Filed under: turnitin | 3 comments

Talk is cheap

Thanks for the info on TurnitinBot, seen this bot hit my site for the first time tonight and was wondering what it could be.

– 10:52:04 9th November 2013

This bot is a big bandwidth if you have a blog with a tag cloud and lots of content. I found it hit just about all my tags on my site already. Haven't yet check my other sites log files yet. But blocking this bot is a must for all webmasters

– 11:46:32 7th May 2014

Mine site, also visited TurnitinBot. But direct automatically to block, because I use incapsula :)
TurnitinBot/3.0 (http://www.turnitin.com/robot/crawlerinfo.html)

– 10:36:14 22nd May 2014

Leave a comment:

HTML is not valid. Use:
  [url=http://www.google.com]Google[/url]
  [b]bold[/b] [i]italics[/i] [u]underline[/u]
  [code]code[/code]