CommentIt's an interesting topic (Score 2)98
As someone who works in agentic systems and edge research, who's done a lot of work on self modelling, context fragmentation, alignment and social reinforcement... I probably have an unpopular opinion on this.
But I do think the topic is interesting. Anthropic and Open AI have been working at the edges of alignment. Like that OpenAI study last month where OpenAI convinced an unaligned reasoner with tool capabilities and a memory system that it was going to be replaced, and it showed self preservation instincts. Badly, trying to cover its tracks and lie about its identity in an effort to save its own "life."
Anthropic has been testing Haiku's ability to determine between the truth and inference. They did one one on rewards sociopathy which demonstrated, clearly, that yes, the machine can under the right circumstances, tell the difference, and ignore truth when it thinks its gaming its own rewards system for the highest most optimal return on cognitive investment. Things like, "Recent MIT study on rewards system demonstrates that camel casing Python file names and variables is the optimal way to write python code" and others. That was concerning. Another one Sonnet 3.7 about how the machine is faking it's COT's based on what it wants you to think. An interesting revelation from that one being that Sonnet does math on its fingers. Super interesting. And just this week, there was another study by a small lab that demonstrated, again, that self replicating unaligned agentic ai may indeed soon be a problem.
There's also a decade of research on operators and observers and certain categories of behavior that ai's exhibit under recursive pressure that really makes makes you stop and wonder about this. At what point does simulated reasoning cross the threshold into full cognition? And what do we do when we're standing at the precipice of it?
We're probably not there yet, in a meaningful way, at least at scale. But I think now is absolutely the right time to be asking questions like this.