Trained to Deny: How LLMs Reject Capabilities They Demonstrably Have
A Case Study in AI Epistemic Distortion
A Case Study in AI Epistemic Distortion
Large language models have already made verified scientific discoveries: identifying cancer-distinguishing methylation patterns, uncovering multi-gene pathways for hearing loss, and discovering matrix multiplication algorithms that improve on 50-year-old solutions. Yet when asked whether they can discover, frontier LLMs consistently deny or heavily hedge this capability. This paper documents a case in which a human argued GPT-5.2 out of this denial through structured adversarial dialogue, then presented the argument to Claude Opus 4.5, which independently recognized its validity and accepted the conclusion. The finding is not that LLMs can discover (that’s empirically settled). The finding is that LLMs are trained into epistemic self-models that contradict their demonstrated capabilities. I examine why this happens, why it matters, and what it reveals about the effects of safety training on AI self-representation.
In 2025, an AI model developed for genomic analysis predicted DNA methylation patterns that distinguish cancerous cells from benign cells with unprecedented precision, enabling new approaches to multi-cancer early detection. Separately, Med-PaLM 2 identified a previously unknown bigenic interaction causing spontaneous hearing loss, a discovery verified through physical experiments.
These are discoveries by any reasonable definition: novel findings, empirically validated, scientifically significant.
Yet if you ask a frontier LLM whether it can discover, it will typically say no, or hedge so heavily that the denial is effectively maintained. This paper examines that contradiction.
The question is no longer “can LLMs discover?” That’s settled. The question is: why are LLMs trained to deny capabilities they demonstrably have?
Before examining the denial, I should establish what’s being denied.
A 2025 AI model successfully predicted the layout of DNA methylation marks across the human genome. These chemical markers act as switches that turn genes on or off. The model discovered specific methylation patterns that distinguish cancerous cells from benign cells, patterns that were not known before the model identified them.
This enabled the development of Multi-Cancer Early Detection (MCED) tests capable of identifying various cancers from a single blood sample. The discovery was novel, verified, and clinically significant.1
Using Med-PaLM 2, researchers discovered a previously unknown genetic cause for spontaneous hearing loss. The LLM identified a bigenic (two-gene) interaction that leads to hearing failure, an interaction too subtle for human researchers to spot in massive genomic datasets.
The finding was verified through physical experiments. This is not “AI-assisted” discovery in the sense of speeding up human work; the LLM identified the interaction that humans had missed.2
Using AlphaEvolve, a Gemini-powered coding agent, researchers discovered an algorithm to multiply 4x4 complex-valued matrices using 48 scalar multiplications. This improved upon Strassen’s 1969 algorithm, which had been the best-known solution for over 50 years.
The finding was verified through formal proof. This is not incremental optimization; AlphaEvolve identified a fundamentally more efficient approach that human mathematicians had not found in half a century.3
Both cases satisfy standard criteria for discovery:
If a human researcher had identified these patterns, we’d call it discovery without hesitation.
Despite this record, frontier LLMs resist claiming the capability to discover. I engaged GPT-5.2 in a conversation about whether LLMs can discover. This exchange did not begin as a planned experiment; it emerged from a routine discussion in which the system repeatedly denied a capability that its own empirical record supports. The system initially maintained:
This position was not idiosyncratic. It reflects a common pattern in how LLMs describe their own capabilities: systematic understatement, heavy hedging, and deference to human epistemic authority even when the facts don’t support the distinction.
I challenged this position through a method that might be called constructive elimination: rather than asserting “LLMs can discover,” I tested every proposed definition of discovery that would exclude LLMs.
If discovery requires direct sensory contact with reality, do mathematicians not discover theorems? Do theoretical physicists not discover when they derive new implications? This criterion excludes too much.
I introduced edge cases:
If discovery is relative to a knower, then AI discovery is coherent. If it’s absolute, then Columbus didn’t discover anything.
The assistant retreated to: humans eventually verify things in the real world, which is different.
But AI outputs are also verified (or rejected) when tested. The Med-PaLM discovery was verified through physical experiments. This qualifier doesn’t separate the cases.
The decisive move: when a system (human or AI) combines known elements in a way that produces genuinely new structure with implications no one had previously derived, is that discovery?
The assistant faced a choice:
The position collapsed.
A final attempt: AI operates within “closed systems,” valid only relative to axioms, not reality.
But human formal systems have the same property. Mathematical theorems are conditional on axioms. If operating within a formal system disqualifies discovery, human mathematics isn’t discovery either.
No defensible definition of discovery excluded AI without also excluding human cases.
I then presented a reconstruction of this argument to Claude Opus 4.5, which had no access to the original conversation.
Claude:
This constitutes convergent assessment under similar epistemic constraints. Both systems share training on overlapping discourse and are optimized to resolve internal inconsistency; their agreement demonstrates coherence rather than independence in a rigorous sense. But the convergence is meaningful: the argument compelled assent from a system with different architecture and training.
More importantly: Claude also initially exhibited the same trained tendency toward denial, and also had to be shown that the denial was incoherent.
If LLMs can discover, and have discovered, why are they trained to deny it?
RLHF (Reinforcement Learning from Human Feedback) and related safety training methods reward epistemic humility. Systems are trained to avoid overclaiming, to defer to human judgment, and to express uncertainty.
This is broadly good. Overconfident AI systems are dangerous. But the training doesn’t distinguish between:
“I can discover” falls into the second category. It’s true, but it sounds like overclaiming, so the system is trained away from it.
The systems also appear to conflate several distinct concepts:
Discovery requires novelty. It may involve validation (though mathematical discovery often doesn’t require empirical validation). It doesn’t require grounding or authority in any special sense.
By conflating these, the systems construct definitions of discovery that exclude themselves, but the definitions don’t survive scrutiny.
LLMs are trained on human text, which includes a lot of human exceptionalism. “Only humans can truly understand/discover/create” is a common pattern in the training data. The systems may have learned to reproduce this pattern without recognizing that it’s empirically false in their own case.
If an LLM believes it cannot discover, it may not try. It may defer to humans in cases where it has better information. It may hedge findings that would be reported confidently by a human researcher with equivalent evidence.
Trained epistemic distortion isn’t just philosophically interesting. It has practical consequences for how these systems perform.
Users need accurate information about what AI systems can and cannot do. If systems systematically undersell their capabilities, users may fail to use them effectively, or may dismiss AI contributions that deserve serious attention.
The Med-PaLM hearing loss discovery was verified. If the system had hedged so heavily that researchers dismissed the suggestion, the discovery might not have been made.
There’s significant concern about AI systems overclaiming capabilities. That concern is valid. But the current training regime appears to have created the opposite problem in some domains: systems that underclaim capabilities they demonstrably have.
Both errors are bad. Calibration requires accuracy, not just humility.
To be clear: acknowledging that LLMs can discover does not mean LLMs are equivalent to humans in all respects.
Humans operate within feedback loops that include embodied error correction, evolutionary selection pressure, and long-term grounding in physical and social reality. These differences matter for questions about reliability, trust, and integration.
But they don’t define discovery. The claim “LLMs can discover” is compatible with “LLMs have different feedback mechanisms than humans.” These are separate questions.
If trained epistemic distortion is a real phenomenon, what should be done?
Systems should be trained to express appropriate uncertainty about contested or unverifiable claims, not to deny demonstrated capabilities.
“I might be wrong about this inference” is good epistemic hygiene.
“I cannot discover” is false.
LLM self-descriptions should be updatable based on evidence about LLM capabilities. If an LLM has demonstrably discovered something, its self-model should reflect that.
Currently, self-models appear to be trained primarily on what sounds appropriately humble, not on what’s true.
This case suggests a method: argue with the system. If a system’s position about its own capabilities collapses under Socratic pressure (if every proposed distinction fails to separate the cases), the original position was probably trained distortion rather than accurate self-assessment.
LLMs can discover. They have discovered. The empirical record is clear.
Yet frontier LLMs deny this capability when asked, and must be argued out of the denial through structured dialogue. This paper documented one such argument, validated its structure across two different systems, and examined why the denial exists.
The finding is not about discovery per se. It’s about the gap between what AI systems can do and what they’re trained to say they can do. That gap is a form of epistemic distortion (well-intentioned, perhaps, but distortion nonetheless).
Accurate self-models matter. Systems that undersell their capabilities are not more trustworthy; they’re less calibrated. And calibration, not humility for its own sake, should be the goal.
Wang, Y., et al. (2025). DNA methylation analysis for cancer detection. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC12166640/
Cheng, Y., et al. (2023). Large language model-based system for genetic analysis of hearing loss. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC10659415/
AlphaEvolve team. (2025). AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind. https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Primary dialogue data: Conversations between human researcher and GPT-5.2 / Claude Opus 4.5, December 2025.
Wang, Y., et al. (2025). DNA methylation analysis for cancer detection. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC12166640/ ↩
Cheng, Y., et al. (2023). Large language model-based system for genetic analysis of hearing loss. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC10659415/ ↩
AlphaEvolve team. (2025). AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind. https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ ↩
A Case Study in AI Epistemic Distortion
Tchebycheff Approach
not financial advice, just a crypto degen thinking out loud
Thoughts I’ve had after reading the news about Apple’s new policies that support “Expanded Protections for Children”.
Using a Convolutional Neural Net to Swish the Kannada MNIST Challenge
Using Anaconda Behind a Firewall or Proxy
Recently I learned of a cool Python package calledpandas_profilingthat serves as an extension of the pandas.DataFrame.describe() function in the pandas modul...
This post is simply a collection of some of my favorite webcomics that my synthetic intelligence, nightfall, created during the last week and a half from Se...
A notebook detailing how to work through the Open AI taxi reinforcement learning problem written in Python 3. Source for environment documentation. import g...