SafeAssign vs. Google for plagiarism detection

I’m gearing up for a conversation/presentation with faculty on our campus about SafeAssign, the “plagiarism” “detection” tool (more on those quotes in a moment) that’s integrated into Blackboard, so I’ve been doing some testing to see how it compares with Google for finding and sourcing suspicious passages.

But first, some definitions:  I put both “plagiarism” and “detection” in quotation marks in the previous paragraph because I don’t think SafeAssign does either one.  First of all, it’s not smart enough to detect when a student is properly citing a source, so any quoted material will be flagged as suspect.  This is fine as long as you’re reading the report carefully, but it could very easily be confusing to a student.  Second, calling what SafeAssign does “detection” can be misleading, since (as I’ll show below) SafeAssign doesn’t detect all plagiarized material and can also incorrectly flag common turns of phrase as plagiarized when in fact they’re simply common phrases.

So what I did was this: I put together a document that consisted entirely of plagiarized passages from resources I had at my disposal.  I tried to get as many different kinds of sources as I could, stopping short of patronizing online “term paper mills,” since I wasn’t about to spend any money on this project.  I used passages from Wikipedia, the open web, full-text articles from EBSCO’s Academic Search Premier, JSTOR, Project Muse, Biography Resource Center, and Oxford Reference Online, as well as a book from Google Books and a print (gasp!) reference work.  For the online resources, I tried to get a mix of HTML and PDF texts.  (This required re-typing some text from a JSTOR PDF!)  Most sources were quoted word-for-word, but I did some bad paraphrasing as well.1

(If you’re interested in the document I came up with, it’s here (Word doc). Fair warning: it makes no sense whatsoever, but that wasn’t really the point.)

Then I ran the paper through SafeAssign, and also checked representative sentences/phrases with Google to see what it could find.

The results were very interesting:  SafeAssign indicated that 66% of the paper was “suspect” and identified 7 sources that matched the text (of the 15 separate passages, it identified 10).  Google found 8 of the 15 passages.2  What was interesting was the patterns in what SafeAssign and Google could, and couldn’t, find:

  1. On the whole, SafeAssign was much better at identifying paraphrased passages than Google, though an expert Googler could probably identify more than just the 8 passages that my method found.
  2. Google did much better than SafeAssign on content from JSTOR and Project Muse (SafeAssign didn’t find any of this content.)
  3. Oddly enough, all the content that SafeAssign detected was attributed to web sources (e.g.,, university web sites), even when I had originally found that content in licensed databases. 3  This is interesting to me in that one of the main selling points for subscription-based plagiarism detection tools (like, which has leaned heavily on this in their marketing) is that they can compare student papers against sources (like licensed databases) that are not on the open web.
  4. Finally, I noticed at the very bottom of the SafeAssign report that there’s a little logo that says “powered by Windows Live Search.” Make of that what you will.

So that’s what I learned.  I’ve always had serious concerns about the ethics of using SafeAssign,, and other such tools, and I’ve has suspicions about the effectiveness of these kinds of tools, but now I have some (not very hard) (actually pretty squishy, but evocative) data on that question.

  1. As an aside, this was all kinds of fun!  Trying to construct bad paraphrases was particularly challenging and weirdly satisfying.
  2. My method for Google searching was to take a single sentence or longish phrase and search for it in Google with quotation marks around the whole phrase. If Google found the passage in the first page of results, I counted it.
  3. In at least one case, I used text from a scholarly article, which article then was quoted (properly!) in a senior thesis, which was online at the student’s college, and SafeAssign attributed my text to that senior thesis!