AI Checkers Forcing Kids To Write Like A Robot To Avoid Being Called A Robot

Mike Masnick

from Techdirt on 2024-09-04 19:14 (#6QFR9)

Can the fear of students using generative AI and the rise of questionable AI checker" tools create a culture devoid of creativity? It's a topic that is curiously one worth delving into a bit more deeply, in part because of something that happened this weekend.

Earlier this year, we had a post by Alan Kyle about California bill SB 942. That bill would require AI companies to offer a free AI detection tool, despite the fact that such tools are notoriously unreliable and prone to nonsense. As Kyle wrote, the bill takes a nerd harder" approach to regulating technology its backers don't understand.

SB 942 has continued to move forward just passed in the California Assembly. It's now on Governor Newsom's desk to potentially sign.

I was thinking about that this weekend after a situation at home. One of my kids* has an English homework assignment. They had to read Kurt Vonnegut's famous short story, Harrison Bergeron, and write a short essay about it. Since I do a fair bit of writing, my kid asked me to review the essay and see if I had any pointers. I gave a few general suggestions on how to think about improving the flow of the piece, as it read very much like a standard first draft: a bit stilted. My kid went off to work on a rewrite.

If you're unfamiliar with the story of Harrison Bergeron, it's about a society that seeks to enforce equality" by placing handicaps" on anyone who excels at anything to bring them down to the least common denominator (e.g., ugly masks for pretty people, having to carry around extra weights for strong people). One of the morals to that story is on the perils of seeking to force equality in a manner that limits excellence and creativity.

Later in the day, the kid came by with their school-issued Chromebook, which has Grammarly Pro pre-installed. The students are encouraged to use it to improve their writing. One thing that the tool has is an AI Checker" in which it tries to determine if the submitted text was written by AI.

This is similar to plagiarism checkers" that have been around for a few decades. In fact, Grammarly's check" covers both AI and plagiarism (or so it says). Those systems have always had problems, especially around false positives. And it seems that the AI checkers are (unsurprisingly) worse**.

It turns out that Grammarly only just introduced this feature a few weeks ago. Thankfully, Grammarly's announcement states pretty clearly that AI detection is pretty iffy:

AI detectors are an emerging-and inexact-technology. When an AI detector definitively states whether the analyzed content contains AI, it's not acting responsibly. No AI detector can conclusively determine whether AI was used to produce text. The accuracy of these tools can vary based on the algorithms used and the text analyzed.

Anyway, the kid wanted to show me that when the word devoid" was used, the AI-checker suggested that the essay was 18% AI written." It's a bit unclear even what that 18% means. Is it a probability this essay was written by AI" or percentage of the essay we think may have been written by AI"? But, magically, when the word devoid" was changed to without" the AI score dropped to 0%.

In Grammarly's announcement, it claims that because these tools are so flaky, it does things differently" than other AI checker tools. Namely, it says that its own tool is more transparent:

Grammarly's AI detection shows users what part of their text, if any, appears to have been AI-generated, and we provide guidance on interpreting the results. This percentage may not answer why" text has been flagged. However, it allows the writer to appropriately attribute sources, rewrite content, and mitigate the risk of being incorrectly accused of AI plagiarism. This approach is similar to our plagiarism detection capabilities, which help writers identify and revise potential plagiarism, ensuring the originality and authenticity of their work.

I can tell you that this is not true. After the kid continued to work on the essay and reached a point where they thought it was in good shape, the AI checker said it was 17% AI, but gave no indication of what might be AI-generated or why.

Now, to be clear, the essay can still be turned in. There is no indication that the teacher is relying on, or even using, the AI checker. When I mentioned all this on Bluesky, other teachers told me they know to basically ignore any score under 60% as a likely false positive. But my kid is reasonably flustered that if the AI checker is suggesting the essay sounds like AI wrote it, that it might mean there's a problem with the essay.

At that point, the hunt began to figure out what could possibly be causing the 17% score. The immediate target was more advanced vocabulary (the issue that had already been identified with devoid.")

The essay did use the word delve," which has now become something of a punchline as showing up in every AI-generated work. There's even a study showing the massive spike in the use of the word in PubMed publications:

6b54a40d-c0d3-4cff-98e2-6fbecf025074-RackMultipart20240901-169-bjkaje.png?ssl=1

Even crazier is the use of both delve" and underscore." However, my kid's essay did not use underscore."

b47991ff-486c-46ba-b4a3-c341e07b6efd-RackMultipart20240901-181-n94ume.png?ssl=1

The main theory I've seen is that the reason delve" is so popular in AI works is that some of the training and data commonly used in AI systems was done in Nigeria and Kenya, where the word delve" is more common. This has resulted in some arguments online, such as when online pontificator Paul Graham tweeted out how receiving an email with delve" in it indicated it was written by ChatGPT, leading a bunch of Nigerians to call him out by mocking him, and highlighting that other cultures use language differently than he might.

Either way, the delve" in my kid's essay was not written by AI. But, just to be safe, the word was replaced. As were some other words. It made no difference. The AI checker still said 17%.

At one point, we looked at a slightly oddly worded sentence and tested removing it. The score went up to 20%. At that point, the kid just started removing each sentence, one at a time, to see what changed the score. Nothing actually seemed to do it, and despite Grammarly's promise of transparency and clarity, no further information was provided.

All of this struck me as quite a series of lessons. First, it points out the absolute stupidity of bills like SB 942 which will only increase, rather than decrease, this kind of AI dousing rod woo woo divination.

But, the bigger lesson has to do with AI and schools. I know that many educators are terrified of generative AI tools these days. Plenty of educators talk about how they know kids today are turning in essays generated by ChatGPT. Sometimes it's obvious, and sometimes less so. And many are not sure what to do about it.

I've seen a few creative ideas (and forgive me for not remembering where I saw these) such as having the students create a prompt to get ChatGPT to write an essay related to a class topic. Then, the real homework is having the student edit and correct the ChatGPT output. The students are then told to hand in the prompt, the original ChatGPT essay, and also their corrections.

A similar idea was to have the students write their own essay and then also have ChatGPT write an essay on the same prompt. Then, the students had to hand in both essays, along with a short explanation of why they thought their own essay was better.

In other words, there are some ways of approaching this, and as time goes on, I expect we'll hear of more.

But, simply inserting a sketchy AI checker" in the process seems likely to do more harm than good. Even if the teacher isn't guaranteed to be using the tool, just the fact that it's there creates a challenge for my kid who doesn't want to risk it. And it's teaching them to diminish their own writing skills in order to convince the AI-checker that the writing was done by a human.

And that seems, ironically, quite like the lesson of what Harrison Bergeron" was supposed to teach us to avoid. Vonnegut was showing us why trying to stifle creativity is bad. Now my kid feels the need to stifle their own creativity just to avoid being accused of being a machine.

I'm not against AI as a tool. I've talked about how I use it here as a tool to help edit my (human) writing, to challenge me, and to push me to be a better (human) writer, even as those tools tend to be awful writers themselves. But I fear that with there being such a fear about AI writing," the end result might actually make people write less with the creativity of humans, and more to simply avoid being called out as a machine.

* In case you're wondering, I checked first to make sure they were okay with me writing about this before telling this story and have kept details to a minimum to protect their privacy.

** After reading through a draft of this piece, kid suggested we should run this through an AI checker as well, and it tells me (falsely) that 3.7% of this article appears to be written by AI (it specifically calls out my description of Harrison Bergeron as well as my description of plagiarism checkers as likely written by AI).

Source	RSS or Atom Feed
Feed Location	https://www.techdirt.com/techdirt_rss.xml
Feed Title	Techdirt
Feed Link	https://www.techdirt.com/