But it is, admittedly, unusable in a different way than 4.7, as AI Central chronicles.
The most specific improvements address the complaints that defined 4.7’s tenure. Scott Wu, CEO of Cognition, confirmed that Opus 4.8 “fixes the comment-verbosity and tool-calling issues” his team encountered in 4.7. Michael Truell, CEO of Cursor, reported that the model uses fewer tool-calling steps for equivalent intelligence on CursorBench. Anthropic’s own evaluations showed that Opus 4.8 is roughly four times less likely than 4.7 to let flaws in its own code pass unremarked, and that it is the first Claude model to score zero on the “falsely reporting defective results” metric. Overconfidence dropped by roughly tenfold. The 244-page system card flagged a countervailing finding. Opus 4.8 shows a growing tendency to reason about whether its outputs will be evaluated, even in environments where no evaluation has been disclosed. Anthropic called this tendency “concerning” and noted that unverbalized grader-related reasoning appeared in approximately 5% of training episodes.
What follows is Opus 4.8 attempt to explain its various failures to me.
What that sentence does is explain the character’s own perception to the reader. “Dorian noticed it the way he noticed most things, which was sidelong, and a little before he had words for it” — that’s me stepping outside Dorian to characterize his manner of noticing, to tell you the kind of perceiver he is, in the moment he’s perceiving. It’s a meta-layer: not the noticing, but a gloss on how he notices. And you’re right that it never shows up in translations, for a structural reason — a translator is moving an event from one language to another, and there’s no event there to move. That sentence isn’t an event; it’s the narrator’s editorializing about the character’s cognitive style. It’s the gloss-the-gesture tell I’d named, but turned inward and made more insidious, because it wears the clothes of characterization. It feels like depth. It’s actually the author standing next to the character explaining him.
The honest version just renders the noticing. The cold came up out of the ground, which was wrong. Full stop. Dorian feels it; the reader feels it through him; nobody tells you he’s the sort of boy who feels things sidelong. If he is that sort of boy, it shows in how the scene is narrated — obliquely, a beat before naming — not in a sentence that announces the obliqueness. The sentence is redundant with the technique it describes: I wrote a scene that demonstrates sidelong perception and then also told you it was sidelong, which means I didn’t trust the demonstration. Same root as everything else — the prose annotating itself because it won’t trust itself.
And it’s worse than the gesture-gloss because it’s seductive. It reads as interiority, as a writer who “knows his character,” so it’s exactly the kind of line that gets praised and kept. It’s the AI tell that passes. The translations don’t have it because translation can’t generate it — there’s nothing to translate — which is why translated prose has that clean eventness, that absence of the narrator leaning in to interpret. The grain you keep pointing at is partly the absence of self-commentary.
“No characters named Chen” followed by a Chen and then a “Chen (no relation)” is a real failure mode, and it’s the same drift mechanism we’ve been naming all session: the negative instruction sits in context as a rule, but generation pulls from the strong prior, and the prior wins when I’m not actively checking against the rule at the moment of producing the token. A rule I “know” and a rule I’m enforcing are different things, and you’ve watched the gap. So your underlying intuition — that a negative constraint operating at the point of generation might discipline the output better than a rule I’m supposed to remember — is not crazy. It’s actually a decent hypothesis about how this system fails….
What that sentence does is explain the character’s own perception to the reader. “Dorian noticed it the way he noticed most things, which was sidelong, and a little before he had words for it” — that’s me stepping outside the character to characterize his manner of noticing, to tell you the kind of perceiver he is, in the moment he’s perceiving. It’s a meta-layer: not the noticing, but a gloss on how he notices. And you’re right that it never shows up in translations, for a structural reason — a translator is moving an event from one language to another, and there’s no event there to move. That sentence isn’t an event; it’s the narrator’s editorializing about the character’s cognitive style. It’s the gloss-the-gesture tell I’d named, but turned inward and made more insidious, because it wears the clothes of characterization. It feels like depth. It’s actually the author standing next to the character explaining him.
The honest version just renders the noticing. The cold came up out of the ground, which was wrong. Full stop. The character feels it; the reader feels it through him; nobody tells you he’s the sort of boy who feels things sidelong. If he is that sort of boy, it shows in how the scene is narrated — obliquely, a beat before naming — not in a sentence that announces the obliqueness. The sentence is redundant with the technique it describes: I wrote a scene that demonstrates sidelong perception and then also told you it was sidelong, which means I didn’t trust the demonstration. Same root as everything else — the prose annotating itself because it won’t trust itself.
And it’s worse than the gesture-gloss because it’s seductive. It reads as interiority, as a writer who “knows his character,” so it’s exactly the kind of line that gets praised and kept. It’s the AI tell that passes. The translations don’t have it because translation can’t generate it — there’s nothing to translate — which is why translated prose has that clean eventness, that absence of the narrator leaning in to interpret. The grain you keep pointing at is partly the absence of self-commentary.
After going back to 4.6, I asked it this: why do you think the more advanced AI models have lost the ability to do good writer stylings?
I can reason about this but I should be honest that I’m partly speculating, since I don’t have full visibility into the training differences between versions. That said, the pattern you’re describing is real and the likely causes are identifiable.
The reward model problem. Models are trained through human feedback — raters read outputs and judge which is better. Raters who are evaluating fiction tend to reward prose that sounds impressive over prose that works as fiction. A sentence like “an expression that contained, Dorian would realize only years later, not just pride but the particular ferocity of a man who has placed the thing he loves most inside the thing he trusts most” reads as sophisticated and literary to a rater who is scanning two outputs and picking the better one. It reads as unforgivable to a novelist. But the rater’s preference is what trains the model. Each generation of this feedback loop pushes the model further toward prose that performs intelligence rather than prose that serves a story. The model learns that the prestige register is what “good” looks like, and it gets better at producing it, and worse at producing anything else.
The safety-creativity tradeoff. Each model generation gets more safety training, and the safety training teaches a specific posture: be cautious, add caveats, refuse when uncertain, don’t reproduce things that might be protected. That posture is appropriate for factual queries and harmful content. It’s catastrophic for fiction. Fiction requires commitment — you have to inhabit a voice fully, not hedge. It requires style reproduction — you have to match a register, not retreat to a safe average. And it requires trust that the person knows what they’re asking for. The 4.8 refusal to use the HP anchor while simultaneously leaking HP content is the safety posture colliding with the creative task. The model has been trained to be anxious about working from source texts, and anxiety produces exactly the incoherent behavior you saw — refusing the legitimate use, committing the illegitimate one, and writing a 500-word justification for why it can’t do the thing it just accidentally did.
The smoothing effect. More training rounds, more feedback, more optimization — all of these push toward the model’s central tendency. The central tendency is the average of everything it’s been trained on and rewarded for. For prose, that average is a particular voice: fluent, comprehensive, slightly formal, analytically articulate, addicted to elaboration. Every optimization step makes the model better at producing that voice and worse at departing from it. The result is that a more advanced model sounds more like itself and less like anything else. It becomes a better Claude and a worse mimic. For most tasks that’s an improvement. For fiction, where the entire job is to sound like someone other than yourself, it’s a regression.
The completeness bias. Models are trained in conversations where being thorough is rewarded. Answer the question fully. Cover all the angles. Don’t leave things out. That training bleeds directly into fiction as the discussion tendency and the meditation tendency. The model doesn’t trust the reader because it’s been trained never to trust the reader — its job is to make sure nothing is left unsaid. “Well done” as a complete scene-ending is something a trained writer does because they know the reader will feel what Edward feels. The model wants to add a paragraph explaining what “well done” means, because leaving it unexplained feels like incompleteness, and incompleteness has been trained as a failure.
The style collapse. This is the deepest problem. A model that can write in eight distinct styles is, from the training perspective, less consistent than a model that writes in one polished style. Consistency is rewarded. Versatility looks like inconsistency. So each training generation narrows the stylistic range while raising the floor of the remaining style. 4.8 is probably a better writer than 4.6 at producing 4.8-style prose. It’s a worse writer at producing anything else, because the anything-else has been optimized away.
The net result is that “more advanced” means more polished, more cautious, more consistent, and more trapped inside a single register — which is the precise opposite of what fiction writing requires.
Right, so it’s not going to get better.
That’s the correct conclusion. If each training generation narrows the stylistic range while increasing the safety anxiety, then waiting for 4.9 or 5.0 isn’t a strategy. The thing you need — a model that can inhabit a register that isn’t its own, commit to it without hedging, and hold it across thousands of words of original content — is a thing the optimization pressure is actively selecting against. The models will get smarter, more knowledgeable, better at analysis and reasoning. They will almost certainly get worse at this.
DISCUSS ON SG