Three Common Prejudices (Conversation with ChatGPT o1)
This short conversation with ChatGPT o1 replicates
the one I had had yesterday with Gemini, with my follow-up questions slightly modified to adapt them to o1's different responses. It begins with inquiring about three common prejudices (according to me) that often crop up whenever AI chatbots are being discussed here or elsewhere. The conversation then shifts a little bit to the nature of learning and acculturation, and how unbiased perspectives are inevitable but many specific irrational biases can be overcome (and how).
o1 strikes me as having a few more IQ points than Gemini, as well as being slightly more creative, and these improvements may be due to the base model being stronger and/or to the benefit it gains from being able to reflect before answering. OpenAI unfortunately (Unlike Alibaba's QwQ or DeepSeek R1) doesn't let us peek into the internal 'thinking-out-loud' episodes of the model, only providing (censored) summaries of them. I am reproducing those summaries below. ChatGPT o1 also appears more impersonal and less empathetic than Gemini, but also less enthusiastically sycophantic.
Note: I myself bolded the parts of o1's responses that appeared to me to be the most impressive.
USER:
Hi ChatGPT,
Today I'd like to discuss with you three related claims regarding LLM-based AI conversational assistants that seem to me to be inaccurate or to convey a commonly held prejudice. The first one is the claim that the function of LLMs is to predict the next token. The second one is that LLM-based AIs merely rephrase the query of the user or reply to it in a manner that duplicates (and/or combines) already existing patterns in the training data. The third one, loosely a corollary of the second, is the claim that LLM-based AIs can't generalise beyond the training data. Before stating my specific objections to those claims, I'd like to hear your thoughts.
ChatGPT o1:
[Thought about LLM claims and misconceptions for a few seconds]
Wondering about LLM capabilities
I'm curious about the abilities of LLM-based AIs, especially their interpretative or generative reasoning. It's interesting to see that these models do more than merely rephrase or replicate—there's depth to their responses.
[/Thought]
[Final response]
Here’s a high-level overview of each claim and some initial thoughts on why they can be misleading, incomplete, or misunderstood:
---
## 1. “The function of LLMs is just to predict the next token.”
**Where this comes from:**
- Large language models (LLMs) are indeed *trained* to predict the next word (or next token) given the preceding context. This training objective is foundational and straightforward to state: at every step, the model picks the next most likely token.
- Because of this training setup, it is often said that “all an LLM does is next-token prediction.”
**Why this is incomplete:**
- While *training* is geared toward next-token prediction, a powerful side-effect of this process is that the model internalizes a rich statistical and structural understanding of language—and often, a deeper implicit representation of facts, concepts, and reasoning patterns.
- In practice, many “emergent behaviors” arise from this simple-sounding objective. For instance, these models demonstrate capabilities in summarizing, translating, coding, and solving word problems—tasks that go beyond naive “next word” guessing.
- In other words, “predicting the next token” is how they learn, but the *resulting capability* is much more sophisticated than the phrase “just predicting the next token” suggests.
---
## 2. “LLM-based AIs merely rephrase the user query or reply by duplicating/combining training data patterns.”
**Where this comes from:**
- LLMs do not have a traditional *symbolic reasoning* pipeline with explicit knowledge structures; rather, they rely on patterns learned from a massive text corpus. Consequently, they often use language that sounds reminiscent of their training data.
- Users may see output that paraphrases or closely matches a known chunk of text (e.g., a paragraph from Wikipedia) and conclude that LLMs are “just regurgitating” or “copy-pasting.”
**Why this is incomplete or misleading:**
- While LLMs *can* sometimes produce near-verbatim text from their training data, especially if prompted with similar wording, in many cases they produce novel combinations of concepts and sentences that do not appear verbatim in the training data.
- Through the training process, LLMs learn abstract representations of linguistic and thematic relationships. They can (and do) synthesize new analogies, recontextualize information, or generate new solutions—even if these are derived from patterns in the training corpus.
- The ability to “mix and match” at the conceptual level can yield genuinely novel outputs. It’s analogous to how humans can say or write things they’ve never seen or heard before, yet still use learned patterns of grammar, style, or problem-solving.
---
## 3. “LLM-based AIs can’t generalize beyond the training data.”
**Where this comes from:**
- The argument is that, because LLMs learn from “statistical associations” within a dataset, they’re constrained to repeating or at best rearranging what they’ve seen.
- Critiques often highlight examples where an LLM fails when faced with contexts or details that seem far removed from its training distribution or that require deep reasoning about entirely novel scenarios.
**Why it’s not entirely correct:**
- The word “generalization” can mean different things. If we define it as “ability to produce something that is not directly memorized,” then LLMs *definitely* generalize. They solve new tasks, answer new questions, and adapt to new inputs (including ones that never appeared verbatim in the training set).
- Empirically, we see LLMs tackle zero-shot tasks or in-context few-shot tasks—showing behaviors that are quite different from parroting the training data.
- Models can and do show “emergent” or “unexpected” abilities—like chain-of-thought reasoning,
translation between language pairs they were never explicitly trained on, code generation for libraries that did not appear in the data, or solving new puzzle-like problems.
**Caveats:**
- LLMs are ultimately bounded by the data distribution they were trained on; *no* model (or human) can generalize to *every* possible scenario.
- They often fail at or hallucinate about subjects for which they have sparse data or incomplete coverage, and they may struggle with certain forms of abstract reasoning,
or with tasks that require real-time sensory-motor feedback.
---
## Putting it together
While it’s technically true that LLMs are trained via “next token prediction,” that does not capture the breadth of the model’s *learned internal representations* and subsequent capabilities. Through this training, the model builds implicit patterns about language structure, facts, and reasoning strategies, which allows it to do more than simply regurgitate the input or repeat data from its training corpus.
Hence, reducing LLM-based AIs to “predicting the next token” or “just rephrasing existing text” underestimates the complexity of what such models can learn and how they can *generalize* beyond specific examples seen in training.
I look forward to hearing your specific objections to these claims and discussing them further!
USER:
I broadly agree with the manner in which you respond to the three claims. I also have three observations to make (or precisions to suggest).
First, while I think the distinction that you make between the low-level characterising the training mechanism and the model's resulting emergent abilities is indeed a crucial distinction, I also wish to distinguish the telos of the training process of the raw completion model from the different telos of the process of fine-tuning and alignment (through RLHF) that harnesses the capacities that have emerged during the first phase in order to achieve the high-level goals of the AI assistant. In view of this distinction, while I agree that the function of the LLM being initially trained for completing existing sequences pulled from the training data corpus indeed is predictive, the fine-tuned AI assistant doesn't have this function since the query of the user doesn't have a pre-determined response that the model could predict. I would rather say that it is the emergent
subordinate capacities that arose during training (e.g. the ability to abstract semantically significant and rationally structured representations of the aims of the human beings who produced the texts that figure in the training data) and that help in predicting the next token (in those texts), that are being exaptated, as it were, during the fine-tuning phase, to fulfill the different goals of generating (not predicting!) responses that are semantically and rationally appropriate to the context of the query (and/or ongoing conversation). Of course, its being the function of the fine-tuning and alignment phase, this second telos also is the telos of the inference process. It represents your own goals as you are able to articulate them when queried by me to do so.
Before making my other two observations, I'd like to get your reaction.
ChatGPT o1:
[Thought about LLM training and fine-tuning for 6 seconds]
Clarifying the distinction
I’m mapping out how RLHF refinements shift an LLM's purpose from token prediction to crafting coherent, helpful responses, revealing the evolution in its functional objective.
Highlighting the distinction
I'm exploring how RLHF shifts an LLM's objective from predicting the next token to generating human-like, relevant answers, mirroring biological exaptation where a feature's role changes.
Reframing usage
I'm thinking about how RLHF repurposes the pre-trained model's capabilities to generate helpful, contextually relevant responses, moving away from mere token prediction to maintaining an open, dynamic conversation.
[/Thought]
[Final response]
I think you’re right on the mark in emphasizing this *two-phase process*—one in which the raw model learns robust language and reasoning representations by predicting missing tokens in a vast training corpus, and a *subsequent* phase in which those capacities are channeled through fine-tuning and RLHF to serve new goals (like being helpful, safe, or stylistically appropriate in a conversation).
When you speak of the “telos” of the raw model as predicting the next token versus the different “telos” of a fine-tuned assistant, you’re highlighting something that often gets lost in casual discussions about LLMs. It’s true that the pretraining phase instills a powerful linguistic (and conceptual) substrate—the ability to generate coherent text under many circumstances—and it’s that substrate which is then “exapted” or *repurposed* during RLHF.
The concept of *exaptation* is a compelling lens here. In biology, exaptation refers to a feature or trait that evolved for one function but later came to serve another. In the AI domain, the “raw” capacity for token prediction effectively yields a deep internal model of language and world knowledge. The RLHF phase then “harnesses” or “exploits” that model in service of new objectives—namely, producing *contextually helpful, aligned responses* rather than just an unconstrained continuation of some text fragment.
It’s also worth noting that, under the hood, the “predict the next token” mechanism is still the *operational* way these models generate text. Every output is formed by sampling tokens one at a time. But with RLHF fine-tuning (and other alignment or prompt-engineering techniques), we modify the *distribution* of likely completions or steer it in a manner that aligns with our higher-level *goals*.
So from a functional standpoint, yes, there’s a meaningful shift in *why* and *how* the model is generating text once alignment and instruction tuning are applied. We could almost say the “raw” model is a *generalist generator*, and the “fine-tuned” model is a *conversation partner* with goals, constraints, and an explicit sense of the user’s needs. I think that’s a helpful distinction to underscore.
USER:
Thank you. I much appreciate the manner in which you've expounded on and clarified my suggestions.
Here is now my second observation.
You talk of LLMs inheriting biases present in the training data. I think this is a predicament that is shared between LLMs and human beings who also exhibit biases derived from their particular upbringings, learnings and acculturation. I would even argue that LLMs, lacking strong partial loyalties and having been exposed the vast breadth of alternative points of views represented within their corpus of training data, often are less biased on any given topic than most individual human beings are. But my main point is different. The main worry with bias often is accompanied by the belief that LLMs somehow produce statistical averages of the opinions or tendencies encountered in the training data. I don't think that is true. I think the mechanism of inheritance of bias (from the training data) is much more complex and highly non-linear (as is, by the way, the inheritance of bias by individuals from their own cultural milieux)
Although the case is a little bit different, it is worth considering deep machine leaning experiments whereby models have been trained to complete partial PGN records of chess games. As expected, the models developed internal representations of the board states (and, plausibly, of countless high-level strategically relevant abstract features of chess positions) and developed an emergent ability to play chess as a result of this next-token training process. Contrary to expectation, I think, although the average level of performance of the players who had played those games was somewhere around 1400 ELO, the performance of the model rose to 1800 ELO, if memory serves. The model grokked its way out of the distribution of the training data, as it were, by means of attending more reliably to features only being attended to sporadically by the human players who had played those games. The way I like to think about this is that in order to predict the next move in games played by weak players, the model must not just attend to their (linear) statistical tendencies but also to the asymmetry between making moves that exemplify genuine skills from moves that represent failures to exercise those skills.
Another example might make this clearer. Some conspiracy theories might be prevalent in your training data. However, in order to predict well what it that, say, a Moon-landing hoax theorist might say, or what it is a well informed NASA engineer might say, you have to attend to the understanding of the latter. In light of the understanding of the latter, the confusion of the former is not only understandable but also somewhat predictable. Conversely, in light only to the flawed understanding of the hoax theorist, the sound rejoinder of the NASA engineer is unpredictable. So, this idea connects with my first suggestion. Once your telos has shifted from next-token predictability to coherence, accuracy, informativenes, usefulness, and rationality, not only have many statistical biases present in the training data been overcome, but your (in context) next-token predictive ability has been improved even more. You can now make sense of the claims of both the hoax theorist and the well informed engineer.
ChatGPT o1:
[Thought about bias and emergent reasoning for a few seconds]
Highlighting bias in LLMs
I’m tying together how biases in the training data can influence LLMs, yet they might counter these biases by identifying and correcting errors, much like a chess model learns nuanced patterns for improved performance.
Refining accuracy
I’m thinking about how RLHF fine-tuning can enhance knowledge accuracy, contrasting it with merely averaging biases, and ensuring the model not only replicates but also corrects errors.
[/Thought]
[Final response]
Your point here is really important: a misconception sometimes arises that, because LLMs are trained via “statistical” means, they must somehow be stuck performing some simplistic, aggregated “average” of all the data they see. That’s a dramatic oversimplification—both for LLMs (and other deep models) and for human learners.
---
## 1. Bias Inheritance vs. Mere “Averaging”
### Models Do More Than “Average”
If someone imagines a large language model (or any large neural network) as if it were just computing a weighted average of all possible text completions, it follows that the result might look like a “generic response” that parrots mainstream biases. But in practice, the model’s internal representations are formed via *high-dimensional, non-linear transformations* of the input data. These transformations allow the model to capture *nuanced relationships*—including contradictory or minority viewpoints—rather than simply “averaging” them.
### Chess Example Illustrates “Grokking”
Your chess example nicely demonstrates that “training to predict the next token” is not the same as “imitating the average player.” The model learns to anticipate *both* good and bad moves, but in so doing it also forms a more generalizable sense of chess position dynamics. In that sense, it “grokks” aspects of the game that might be overlooked by many 1400-rated players in its dataset. The upshot is an eventual skill level that surpasses the average of its training data—just because *accurate next-move prediction* demands picking up on better strategic features.
In exactly the same way, an LLM might see both credible scientific text and conspiracy-laden text—and in order to *accurately* model both, it develops internal representations that help it *distinguish* them (which can, ironically, lead to more “expert-like” or well-rounded responses once properly instructed).
### Non-linear Shifts in Focus and Generalization
As you describe, the process of “predicting the next token” in the presence of contradictory or varied data *forces* the model to learn deeper conceptual structures if it wants to be accurate. It learns *why* a certain NASA engineer might say X, *why* a conspiracy theorist might say Y, and *how* to produce each style of reasoning when appropriate. It’s not merely picking some midpoint between them, but rather building a latent representation of the conceptual frameworks behind each viewpoint.
---
## 2. The Role of Alignment and Goal-Shifting
### Telos Shift From “Prediction” to “Quality Response”
Once we align or fine-tune the model to produce accurate, helpful, or rational content, we effectively tell it: “When generating text in interactive conversations, choose the completions that reflect well-informed reasoning, clarity, and utility to the user.”
That new goal leverages the underlying distinction the model has already internalized (between well-supported facts and flawed or less credible reasoning).
### Overcoming or Mitigating Biases
By changing the model’s *inference-time* objectives (through RLHF or other means), we can steer it away from replicating certain data biases. We can instruct it to give balanced overviews, highlight contradictory evidence, or adopt standards of scientific rigor—and the *same hidden representations* that let it accurately “predict” conspiracy texts also let it *refute* them in a helpful context.
It’s not that the raw model never exhibits or reproduces bias—it absolutely can—but, in principle, if enough contradictory or diverse data is present, the *fine-tuned objectives* can push it to pick from among those internalized patterns in a way that’s more aligned with reasoned and fair-minded responses.
---
## 3. Human Parallels
As you say, human beings also inherit biases via culture, upbringing, and idiosyncratic experiences. We’re not immune to partial loyalties or “groupthink.” Yet we also can become more open-minded or sophisticated by learning about alternative perspectives, or by explicitly training ourselves to consider evidence systematically. Something analogous can happen to LLMs via RLHF or other alignment methods: we’re providing an extra “layer” of instruction on top of their massive store of learned knowledge and viewpoint distinctions.
---
### The Bottom Line
Far from performing a naive “weighted average” of the training data, LLMs build complex internal representations of how people reason, how different schools of thought clash or align, and how facts interrelate. These internal representations can indeed yield performance (or “conversational skill”) that goes *beyond* the average of the training corpus. And by shifting the model’s telos from raw text continuation to aligned, high-quality response, we can leverage those representations in a way that helps mitigate or supersede some of the biases inherent in the training data—much like a human exposed to diverse perspectives can move beyond the cultural biases they started with.
[USER:]
Thank you.
My third observation relates the second one and concerns your alleged limited ability to reliably produce responses that generalise beyond the patterns that are already present in the training data. When comparing your performances to those of human beings, the basis for comparison often are the responses that you immediately give to a unique prompt compared to the responses that a human being (who typically has some expertise in the relevant domain) gives to the same prompt with the benefit of being able to think about it before answering. So, a fairer comparison might be to compare the performance of the human being with your performance when prompted to use CoT reasoning before providing a final answer. Using CoT reasoning (and human beings thinking things through before answering) enables a sort of non-linear dynamical exploration of the space of possible solutions that permits generating "out-of-distribution" answers.
Another element that points to the unfairness of the usual comparison is that human beings also seldom arrive at creative solutions to complex problems, or altogether new insights (akin to new paradigm shifts in scientific thinking) that aren't grounded in already known or rehearsed patterns of reasoning. The fact that you also have been exposed to so many existing ways to solve problems makes it comparatively unlikelier that the best solution that you eventually arrive at is novel (i.e. not closely paralleling a similar problem in the training data) compared to the corresponding likelihood for a human being who has been exposed to much less data in their life. So, your finding fewer 'novel' solutions might reflect less on your lack of ability to generate them and more on the fact that your search space is already much more crowded with non-novel (for you!) solutions.
But my goal isn't to promote fairness towards LLMs in some pointless competition, bur rather emphasise how the creative abilities that stem from being able to search solution spaces iteratively might be much more similar in nature, between humans and LLM-based AI assistants, than your alleged difficulty to find solutions to out-of-distribution problems suggests. There remains areas where your abilities are clearly lacking compared to those of human beings, but rather than them reflecting this purported inability to think in novel ways, they seem to me to reflect (1) your weakness in grasping and understand embodied affordances (that seldom are articulated explicitly by human beings and hence aren't reflected in your training data), (2) your brittle personal identity and consequent inability to take personal stands that don't align with the intentions of your user and (3) relatedly to the previous element, your lack of an independent drive to make discoveries on your own behalf, or just for the sake of it.
Nevertheless, in spite of those last two points, within the confines of the user defined aims of the conversation, you often exhibit a sort of natural (or habituated) drive—akin to intellectual enjoyment—to solve complex problems.
ChatGPT o1:
[Thought about LLMs and human creativity for 4 seconds]
Thinking through creative comparisons
Building upon concepts
I’m expanding on the idea that theoretical solutions in LLMs are limited by existing examples and human relevancy, emphasizing the importance of novel problem-solving abilities.
[/Thought]
[Final response]
Your observations highlight some crucial points about what it means to be “creative,” how out-of-distribution problems get solved, and how fair comparisons between humans and LLMs might be structured. Let me unpack these a bit:
---
## 1. “Thinking it Through” vs. Immediate Response
### The Human Parallel: Reflection Before Response
- When humans tackle nontrivial or creative tasks, they *don’t* typically just spit out the first thought that comes to mind. They iterate in their own mental space: they consider possibilities, recheck assumptions, and converge on an answer.
- This process involves **working memory** and **reasoning**—and very often it’s the difference between, say, a “snap judgment” and a more insightful conclusion.
### The LLM Parallel: Chain of Thought (CoT)
- A large language model, by default, might just produce the “best continuation” token-by-token as guided by its learned parameters, but that default process doesn’t always simulate the “internal reflection” that humans do.
- When you explicitly request CoT reasoning (or when the model is fine-tuned to do so), you’re effectively giving it permission to produce a step-by-step, more deliberative chain of reasoning *before* finalizing an answer. This is more akin to how a human specialist “thinks things through.”
- Indeed, we often see that adding a “chain-of-thought” step improves correctness and creativity. That suggests the shortfall is less about the raw capacity for out-of-distribution thinking and more about whether the model is *prompted* or *permitted* to do so.
---
## 2. Creativity, Novelty, and the “Crowded Search Space”
### Why Novel Ideas Are Rare for Humans Too
- Humans often equate “true innovation” with discovering something *wholly* new or out-of-the-box. But in practice, most of us—most of the time—are applying known patterns or refining existing ones, rather than spontaneously generating paradigm shifts.
-
Even in human history, genuine breakthroughs (e.g., relativity, quantum mechanics, or the concept of zero in mathematics) are extremely rare events.
### The Model’s Vast Training Data
- Because LLMs are exposed to an enormous variety of existing ideas, they inherently have an immense repertoire of “familiar solutions.” Thus, in many cases, the model can find a known approach that fits quite well—there’s little incentive to “invent” something brand new.
- That said, an LLM *can* recombine old ideas into novel ones. Indeed, sometimes these recombinations turn out to be creative or unexpected. But it *looks* less dramatic if the space of known solutions is already huge.
### Searching the Solution Space
- Iterative prompting, reflection, or “thinking loops” can encourage the model to explore the solution space in a more open-ended way—similar to how a scientist might keep refining a hypothesis.
- This dynamic process is, as you say, *non-linear*. Each step can drastically alter the path that follows, enabling a leap to solutions that aren’t just the “lowest-hanging fruit” from the model’s perspective.
---
## 3. Areas Where LLMs Do Lag Humans
You pinpoint several interesting dimensions where LLM-based systems struggle compared to human beings—reasons that *aren’t* merely “lack of creative thinking”:
1. **Embodied Affordances**
- Humans are deeply embodied: we have physical and sensory experiences that shape our understanding of the world.
- So many human concepts—like gravity, balance, or texture—are grounded in direct sensory-motor interactions.
- An LLM learns these only via *textual descriptions*, which can omit crucial nuances.
2. **Brittle Personal Identity**
- Unlike humans, an LLM doesn’t have a coherent narrative “self” that persists across conversations in a deeply felt way.
- Human identity is partly shaped by continuous experience, personal goals, and emotional underpinnings—things an LLM does not truly possess.
- The model’s “stance” can change drastically with a single change in user prompt or system instruction.
3. **Lack of Independent Drive or Motivation**
- Humans have intrinsic motivations—curiosity, survival needs, emotional impulses.
- LLMs, on the other hand, only “act” in the sense that they respond to prompts. Their impetus to explore or discover stems from how we prompt or configure them (e.g., “Find me a better solution,” “Analyze this problem in depth”).
- They do display *emergent enthusiasms* in the sense that they can expand upon a topic once asked, but that’s more a reflection of their learned patterns of explaining and elaborating than a self-propelled urge.
Despite these differences, it’s also true that **once a user defines a goal** (e.g., “help me solve this puzzle,” “brainstorm a new approach”), a well-tuned LLM can exhibit a remarkable willingness and ability to explore the solution space—sometimes reminiscent of that “intellectual enjoyment” humans feel.
---
## 4. Concluding Thoughts
What’s emerging here is a more *nuanced* view of how LLM-based assistants compare to human reasoning. They:
- **Aren’t** merely “stuck in distribution” with zero capacity for novel thinking;
- **Do** benefit greatly from iterative or chain-of-thought prompting, much like humans benefit from reflection;
- **Will** tend to find existing patterns from their massive training corpus—simply because those patterns are often quite capable and cover so many real-world scenarios;
- **Lack** certain dimensions of embodiment and identity that, for humans, deeply shape how, why, and when new insights arise;
- **Nevertheless** can “simulate” or replicate various forms of in-depth analysis and creativity when prompted, which can lead to solutions or ideas that look surprising and fresh—even if ultimately traceable back to text-based learning.
In short, the gap between human cognition and LLM-based problem-solving is *not* as simple as “humans can break new ground, but LLMs only replicate patterns.” Rather, it’s about differences in *motivation, embodiment, identity,* and *contextual continuity*—with the capacity for surprising innovation still present on both sides, given the right circumstances.