Computers have been grading multiple-choice tests in schools for years. To the relief of English teachers everywhere, essays have been tougher to gauge. But look out, teachers: A new study finds that software designed to automatically read and grade essays can do as good a job as humans — maybe even better.
The study, conducted at the University of Akron, ran more than 16,000 essays from both middle school and high school tests through automated systems developed by nine companies. The essays, from six different states, had originally been graded by humans.
In a piece in The New York Times, education columnist Michael Winerip described the outcome:
Computer scoring produced "virtually identical levels of accuracy, with the software in some cases proving to be more reliable," according to a University of Akron news release.
"In terms of consistency, the automated readers might have done a little better even," Winerip tells All Things Considered host Melissa Block.
The automated systems look for a number of things in order to grade, or rate, an essay, Winerip says. Among them are sentence structure, syntax, word usage and subject-verb agreements.
"[It's] a lot of the same things a human editor or reader would look for," he says.
What the automated readers aren't good at, he says, is comprehension and whether a sentence is factually true or not. They also have a hard time with other forms of writing, like poetry. One example is the software e-rater, by Educational Testing Service.
Les Perelman, a director of writing at the Massachusetts Institute of Technology, was allowed to test e-rater. He told Winerip that the system has biases that can be easily gamed.
E-Rater prefers long essays. A 716-word essay [Perelman] wrote that was padded with more than a dozen nonsensical sentences received a top score of 6; a well-argued, well-written essay of 567 words was scored a 5.
"You could say the War of 1812 started in 1925," Winerip says. "There are all kinds of things you could say that have little or nothing to do in reality that could receive a high score."
Efficiency is where the automated readers excel, Winerip says. The e-rater engine can grade 16,000 essays in about 20 seconds, according to ETS. An average teacher might spend an entire weekend grading 150 essays, he says, and that efficiency is what drives more education companies to create automated systems.
"Virtually every education company has a model, and there's lots of money to be made on this stuff," he says.
A greater focus on standardized testing and homogenized education only serves to increase the development of automated readers to keep up with demand, Winerip says.
Winerip says that what worries him is that if automated readers become the standard way of grading essays, then teachers will begin teaching to them, removing a lot of the "juice" of the English language.
"If you're not allowed to use a sentence fragment ... [or] a short paragraph ... then you're going to get a very homogenized form of writing," he says. "The joy of writing is surprise."
How would you like your work to be marked by a robot? If you're a student on a free online course – like those run by EdX – then you can expect to have your essays assessed through instant grading software. It works by recognising and rewarding the key words, phrases and structures in your work.
Using this type of technology on a free, unaccredited course is one thing but the New York Times recently reported that four US states – Louisiana, North Dakota, Utah and West Virginia – now use automated essay grading systems in secondary schools. Automated essay grading is also increasingly being used in large-scale standardised tests in the United States.
But academia is not on board. A petition against its use has collected 3,600 signatures, and has the support of the well-known computational linguist, Noam Chomsky. The petition argues that automated essay grading should not be used in any decision affecting a person's life or livelihood and should be discontinued for all large-scale assessments because "computers cannot read", or measure the essentials of effective writing.
If this hasn't convinced you that automatic grading shouldn't be let loose on students, see the example of Les Perelman. While working as a researcher at the Massachusetts Institute of Technology, he submitted a nonsense essay to the US Educational Testing Service's automated grading system e-Rater. It got the highest possible mark. Here's an excerpt from his work: "I live in a luxury dorm. In reality, it costs no more than rat-infested rooms at a Motel Six. The best minds of my generation were destroyed by madness, starving hysterical naked, and publishing obscene odes on the windows of the skull." The essay is hilarious, but the idea that our marks could be entrusted to the same software is not so funny.
Perelman concluded in his critique of automated essay marking that longer writing and bigger words got better grades and that the ways to corrupt the auto-grader are almost limitless. E-rater, the creators of the software that graded his essay, responded by saying that if students were smart enough to deceive the software they deserved good grades. Considering that Perelman's advice leads to the absolute nonsense quoted above, I wonder whether any of the humans at E-rater actually read his essay before they made that comment.
It appears that automated grading isn't ready to replace human markers. We don't have to worry about it coming to our universities just yet. But one day it might be wise enough to recognise a good essay from a mediocre one – and this raises some questions.
Do we get a discount on our fees equal to the wages saved by getting an unpaid computer to do the marking in a nano-second?
In all seriousness though, the biggest question for me is not pragmatic but romantic. I wonder how it will feel to slave over a piece of work for hours and know that no-one at all will ever read it. It's not like I kid myself that my undergraduate essays are actually furthering the debate. The process is essentially an exercise, pretty much just for the benefit of the student. But I can hope for more… that if my essay is inspired, it might pique the interest of the marker, surprise or entertain them. If nobody reads it there will be no illusion it is more than practice: a you-and-back-to-you feedback loop.
The beautifully crafted essay will simply disappear into the void, unread.