The ACS questions were developed with the assistance of AI and subsequently reviewed by content validation panels and a subject matter expert in advance of the exam...
Because of the subsequent review by qualified humans, I don't see this as a problem. But the scenario described in TFS is just begging for a 'slippery slope' argument.
I think it's inevitable that in the future, one AI will be tasked with checking the work of another AI. Then, in a storm of stupidity, the policy will become having the AI check its own homework.
After that, of course, the practice will devolve into just trusting the initial AI output, with no verification step. Welcome to the AI apocalypse!
My concern in this particular instance is that the company using the AI was also in charge of reviewing the questions generated by AI. And the end result was "confusing questions" (straight from the summary) which was partially the reason the entire test (along with login issues and screen lag) was called into question. I'm thinking there's bigger systemic issues here than just having AI involved in the process, but having AI involved in the process clearly wasn't helping anything.
Though I know what you're saying is already becoming a common practice for non-legal matters. Having one AI assess another AI's output, and sometimes throwing another in as a final check, is already being pushed as a way to overcome what are referred ot as the "inadequacies of current generation AI." The problem is, you have a garbage in, garbage out situation, where if AI #1 produces readable dreck, and AI #2 is set to correct grammar, and AI #3 is set to review content, who's to say you haven't fed so much bullshit from AI #1 through the whole process that AI #3 starts to view the bullshit as more valid than actual facts? Or maybe that's how they become more human. "My bullshit is just as valid as your facts," is, after all, the new norm.