How LLM-based chatbots work: their minds and cognition

apokrisis

The structural point stands: keep the causal story of how signs and habits arise distinct from the normative story of how reasons bind, and you get continuity without reduction—and a clean place to situate LLMs as artifacts that participate in semiosis without thereby acquiring the full normative standing of persons. — GPT-5

I’m finding GPT-5 to be rather contorted in is arguments and if this were some human student knocking out a paper, I would quickly toss it. A lot of very hazy and grasping connections being made.

But the interesting thing in Peirce is his argument for cosmic existence as pansemiosis - habit-taking being the semiotic cause of reality itself. The Universe as a universalised growth of reasonableness.

So are we talking causes or norms? Well the argument is that the mind finds reason to be based on the necessity of a triadic logic. And that same triadic logic which imposes itself on our epistemology is itself simply the way any form of reality - any ontological order - would have to arise.

This is the big deal. Ontology and epistemology are reflected in each other.

Or to put it biosemiotically, the Cosmos is a dissipative structure. And so are life and mind. The only difference is that life and mind have a semiotic machinery to make their own worlds within the larger world. The Cosmos is a dissipative system, and life and mind are systems for milking the larger system.

It sometimes feels to me like Apokrisis focuses on the task of explaining "real" signification, in embodied living/physiological contexts … in a way that locates overarching telic force in dissipative structures while showing little concern for antipsychologism. He does grant downward-causal power to signs (within a triadic theory) but not in a way that makes them reasons rather than mere motivations — Pierre-Normand

Do I need to be concerned with antipsychologism? Why?

But yes. That was the specific reason I got involved with Peirce in the first place. I was working with a community who understood life and mind in dissipative structure terms. Or dissipative structure as harnessed in a modelling relation by systems that could encode information - encode nonholonomic constraints to use the physics jargon.

So Peirce crystallised things nicely at the level of a universal logic of self-organising systems. The triadic structure of his metaphysical logic could be seen to be exactly the same as the triadic structure of the hierarchy theory that the theoretical biology community had been working on.

The battle was against scientific reductionism. And a triadic logic of self-organising systems was the way to combat that with a story of irreducible causal holism.

Peirce’s logic of vagueness in particular broke an important conceptual logjam.

So there is a lot of backstory to my particular take on Peirce.

**On apokrisis’ emphasis.**
If he locates the overarching telos in “dissipative structure,” he’s giving a powerful **enabling** story. Peirce would say: good, but **don’t let it become the arbiter**. Biosemiosis without the normative sciences slides toward “motivations” only. Bring Peirce’s final causation and normative ladder back in, and you recover **reasons** as top-down constraints with real causal bite. — GPT-t

Sadly Peirce was sort of aware of the power dissipative structure and self-organising physics, but also he lapsed into the awfulness of agapism when pushed for a telos. So no way I want to follow him down that path.

I’m happy enough with the laws of thermodynamics encoding the rationality of cosmic existence. This is maybe why I can never get exercised by the is/ought dilemma. As a dichotomy, it seems pretty moot.

bert1

So there is a lot of backstory to my particular take on Peirce. — apokrisis

Please could you tell us about it

hypericin

↪Pierre-Normand

I'm seeing a strong parallel between this discussion and an earlier one we both participated in: the epic (and epically frustrating) indirect realism thread. If you remember it, you took the direct realist side in that debate, and I took the indirect realist. This problem is a kind of a mirror image of the problem of knowledge. And we, predictably, seem to be taking the same sort of direct/indirect realist approaches

My claim:
* Public performance is not interiority.
* As a third person observer, I only have direct epistemic access to public performance.
* Via public performance, I gain indirect access to interiority.
* Error cases (performance/interiority mismatches) are made possible only by this indirection

The parallel indirect realism argument:
* Private perception is not the perceived world
* As a first person subject, I only have direct epistemic access to private perception
* Via private perception, I gain indirect access to the perceived world
* Error cases (perception/world mismatches) are made possible only by this indirection

Your original claim, that LLM interiority cannot happen in the absence of the public engagement that accompanies our own interior states, seems much less plausible in the indirect view. If interiority and public engagement are fundamentally decoupled, then it seems very plausible that you can have one without the other. Your claim is much more at home with the tighter coupling of the direct realism approach.

Granted that we will not resolve the direct/indirect dispute, do you agree with this?

Pierre-Normand

Sadly Peirce was sort of aware of the power dissipative structure and self-organising physics, but also he lapsed into the awfulness of agapism when pushed for a telos. So no way I want to follow him down that path.

I’m happy enough with the laws of thermodynamics encoding the rationality of cosmic existence. This is maybe why I can never get exercised by the is/ought dilemma. As a dichotomy, it seems pretty moot. — apokrisis

My own concern is primarily to avoid collapsing the norms of rationality into generalized norms of biology, especially the norms of practical rationality, from which those of theoretical inquiry are derivative. Following Michael Thompson (Life and Action) I view the logical form of practical deliberation (and of rational social practices) as continuous with biological teleology. Rational norms are outgrowths of biological teleology, particularized to our contingently developed form of life as rational language users. But this form of life has its own telos, known from the inside, with no need of external or transcendent grounding.

When we recognize what the proper decision is in a particular practical situation, through proper exercises of practical deliberation, this recognition is completely divorced from what maximizes energy dissipation and only indirectly connected to what makes us flourish as animals. There is obviously more (morally) to human life than being maximally healthy and reproductively successful.

Perhaps Peirce's agape, viewed as a requirement for reason, isn't a top-down telos tied to the ideal end of rational inquiry but rather a generalized tendency in nature for life forms to develop into communities that make the pursuit of such an ideal possible. If we remain skeptical about such a generalized tendency (as I am) we can still be content with the contingent emergence of our own rational community as supplying its own norms from within, through processes of debate and deliberation.

Pierre-Normand

I'm seeing a strong parallel between this discussion and an earlier one we both participated in: the epic (and epically frustrating) indirect realism thread. If you remember it, you took the direct realist side in that debate, and I took the indirect realist. This problem is a kind of a mirror image of the problem of knowledge. And we, predictably, seem to be taking the same sort of direct/indirect realist approaches — hypericin

Very true! I remember this discussion. I greatly enjoyed it. If you allow me to make a meta-philosophical remark: Scientific debates that don't end in agreement are in one particular respect defective. They flout a norm of scientific inquiry that aims at truth and objectivity. Soccer games are a bit different. The players of both teams never seem to reach an agreement regarding which one of the two goals the soccer ball should be kicked into. If they would reach such an agreement, though, they would flout a norm of the game and the fans likely would be unhappy. Philosophical debates, I think, sit somewhat in the middle. The goal neither is to reach agreement, nor to win, but rather to foster understanding. That doesn't mean either that the debaters should just agree to disagree. They just need to agree to pursue the discussion despite endorsing incompatible goals and premises.

I'll come back to the substance later.

Harry Hindu

Even if you think this all inconsistent, the best you can conclude is that it is all inconsistent, but not that entails some other official declaration. — Hanover

Fair enough. So my argument simply stands for those that recently made the argument that AI's responses are not valid responses while also having taken the position is meaning is use. I'm fine with that.

The limitation imposed by Witt is to knowledge of the metaphysical, not the physical. Some words have referrants. I'm not arguing idealisim. — Hanover

Right. What exactly is the limitation imposed on our knowledge by language if not that the language we are using has no referent (evidence) - in other words we have no way of knowing if our language captures reality until we make an observation (the scribbles refer the observation)? Metaphysical talk is simply patterns of scribbles on the screen if there is no referent. Just because you've followed the rules of grammar does not mean you used language. All you've done is draw scribbles - the same as AI. One might say that human metaphysical language-use is akin to all AI language-use in that it has no way of knowing what it is talking about.

What does "use" in "meaning-is-use" mean anyway? To use something means that you have a goal in mind. What is it you hope to accomplish in using scribbles or utterances?

We can assume that our perceptions are similar for all the reasons you say. That doesn't mean we need refer to the private state for our use of language. What fixes language under his theory is the publicly available. That is, even if my beetle isn't your beetle, our use of "beetle" is what determines what beetle means. However, if a beetle is running around on the ground and you call it a cat and I call it a beetle, then we're not engaging in the same language game, because the public confirmation is different. — Hanover

But if a cat is in my box and a beetle in yours, then how exactly are we playing the same game? It would only appear that we are from our limited perspectives, just as it appears that AI is human because of the way it talks.

Language-use assumes realism and that other minds exist, or else why use scribbles at all - for what purpose?

In example A, if we consistently call this object a beetle, it is irrelevant what my internal state is. We live our lives never knowing what goes on in our heads, but we engage in the same language game. What happens in my head is irrelevant for this analysis. It does not suggest I don't have things going on in my head. It just says for the purposes of language it is irrelevant. — Hanover

But it's not at all irrelevant. You and I must be able to distinguish between the beetle and the rest of the environment - the ground, the trees, myself, yourself, the scribbles we are using. So it seems critical that we make the same kind of distinctions and perceive the boundaries of the things we speak of in the same way.

Cats are much larger and differently shaped than beetles, so if what you said is possible then it would be impossible to be playing the same language game as the boundaries of the object in my box do not align with the boundaries of the object in yours, so I might be pointing to a space that you are not with my use.

Metaphysician Undercover

Both the rules for speech and writing are rules of a norm governed public practice that is taught and learned (while the rules for using the words/signs are taught primarily through speaking them). — Pierre-Normand

Sure, because we live in a post unification world. Remember, my hypothesis is that the unification is what allowed for the evolutionary explosion of intelligence. That the two are united, in a post unification world, is tautological and doesn't prove a thing about the underlying foundations. The point though, is that in an analysis of language use in general, such as what Wittgenstein did, the two are distinguishable as distinct forms, derived from different types of intention, like I explained.

I remain flummoxed by your crazy logic. — apokrisis

You refuse the analysis, because what it proves is inconsistent with your preconceived semiotic ideals. In reality it is the unintelligibility of your preconceptions which have flummoxed you. These preconceptions have disabled your ability to apprehend what is really the case, so that it flummoxes you.

In prior discussions you revealed to me that your principles are unable to provide a separation between sign and principles of interpretation. So you assume that the rules for interpretation inhere within the sign itself. But this is completely inconsistent with our observations of language use. What we observe is that a separate agent applies the principles of interpretation, upon apprehension of the sign. It is impossible that the agent derives the rules from the sign itself, because this would require interpreting the sign to obtain the ability to interpret the sign.

This is the problem which Wittgenstein approached at the beginning of PI. That type of thinking requires always, that one already knows a language prior to being able to learn a language. That is what led him to investigate the private language, as the language required for the capacity to learn a proper public language. The inclination to reduce all language use to "rules of a norm governed public practice", as you do in your reply to me above, is what produces this problem of incoherency explored by Wittgenstein. The incoherency being that one must learn the rules through language, but knowing the rules is required for understanding the language.

Your semiotic assumption that the rules for interpretation inhere within the signs themselves, implying that the sign interprets itself, only deepens the incoherency. You deepen the incoherency because you refuse to accept the proper analysis which works to separate the private from the public aspects of language use. This proper analysis reveals the two to be complete distinct forms, driven by opposing intentions. The reality of the opposing forms implies that we cannot class "language use" in one category. It consists of two very distinct types of activity, and it must be understood that way, or else we're lost in misunderstanding. ,

Pierre-Normand

Sure, because we live in a post unification world. Remember, my hypothesis is that the unification is what allowed for the evolutionary explosion of intelligence. That the two are united, in a post unification world, is tautological and doesn't prove a thing about the underlying foundations. The point though, is that in an analysis of language use in general, such as what Wittgenstein did, the two are distinguishable as distinct forms, derived from different types of intention, like I explained. — Metaphysician Undercover

Ah, sorry, I had missed that. Had you made this issue bear on the topic of the present thread? (I don't mind anyone exploring tangents, but I'm just curious to know.)

Metaphysician Undercover

Had you made this issue bear on the topic of the present thread? — Pierre-Normand

No, I was commenting on apokrisis' proposed evolution of language, indicating that I think he leaves out the most important aspect. That important aspect being the reality that spoken language and written language are fundamentally two very distinct forms, derived from very distinct intentions. And, I argue that the common practise of taking for granted the union of the two, as if the two are different parts of one activity (language use), instead of understanding the two as two distinct activities (having different intentions) is very misleading to philosophers of language. How this bears on the topic of the thread, I do not know as of yet.

Pierre-Normand

How this bears on the topic of the thread, I do not know as of yet. — Metaphysician Undercover

All right. No worries. Looking back I saw your latest response to me that I had let slipped through. I'll come back to it.

Hanover

Fair enough. So my argument simply stands for those that recently made the argument that AI's responses are not valid responses while also having taken the position is meaning is use. I'm fine with that. — Harry Hindu

There are plenty of reasons not to engage a bot even if the bot fully passed the Turing test.

Metaphysical talk is simply patterns of scribbles on the screen if there is no referent. — Harry Hindu

Which major philosopher holds to the position that every word has a referent? Are we about to start arguing theology or something? The position that words can exist without referents is widely held across the board, not just some odd Wittgensteinian result.

But if a cat is in my box and a beetle in yours, then how exactly are we playing the same game? — Harry Hindu

Because it's a language game, not a metaphysical game.

Cats are much larger and differently shaped than beetles, so if what you said is possible then it would be impossible to be playing the same language game as the boundaries of the object in my box do not align with the boundaries of the object in yours, so I might be pointing to a space that you are not with my use. — Harry Hindu

The box is a thought experiment. We're not talking about actual boxes. You can neither see the box nor the beetle. The box represents your mind and the beetle the contents of your mind. But I'll concede the point, if your Christmas present were a cat, it would come in a box bigger than if I were giving you a beetle.

Hanover

I’m not at all sure what issue you mean to discuss. But I’ve been addressing the ways that while LLMs can plausibly pass for cunning linguists, they fail any more general test of being alive and mindful. Which brings us to biosemiosis and how the human mind is a nested hierarchy of semiotic levels. — apokrisis

Ok. But I never disputed the distinction between bots and people. People have souls (or "being alive and mindful" if that's your preferred phrase). I was discussing whether one needs a soul to fully communicate. I don't think they do. I only want to debate with humans because I'm openly hostile to bots, thinking them second class citizens, devoid of rights, and not worthy of our exclusive country club. I can play with my ChatGPT software when I'm not logged in here.

Pierre-Normand

I don't see how the fact that the LLMs have gotten much better at doing what they do, justifies your conclusion that what they do now is categorically different from what they did before, when they just weren't as good at it.

It's relevant to displaying an LLMs successful deployment, with intelligent understanding, of its "System 2" thinking mode: one that is entirely reliant, at a finer grain of analysis, on its ability to generate not just the more "likely" but also the more appropriate next-tokens one at a time.
— Pierre-Normand

I still don't see the point. Isn't that the goal, to generate what is appropriate under the circumstances? How does the fact that the LLMs are getting better at achieving this goal, indicate to you that they have crossed into a new category, "intelligent understanding", instead of that they have just gotten better at doing the same old thing? — Metaphysician Undercover

These need not be mutually exclusive propositions. The categorical change I'm pointing to occurred between recurrent networks and transformer-based LLMs with attention mechanisms. Before transformers, there simply weren't conversational AI systems that could understand natural language queries well enough to provide coherent, relevant answers. See the Quanta article ("When ChatGPT Broke an Entire Field: An Oral History") that I linked here.

But the more crucial point concerns what happens during the training process. During pre-training (learning to predict next tokens on vast amounts of text), these models develop latent capabilities: internal representations of concepts, reasoning patterns, world knowledge, and linguistic structures. These capabilities emerge as byproducts of the prediction task itself. Again, as Sutskever and Hinton have argued, accurately predicting the next word in complex texts often requires developing some understanding of what the text is about. Post-training (in order to aim at more appropriate and context sensitive answers) doesn't create new capabilities from scratch. It mobilizes and refines abilities that already emerged during pre-training.

So when you ask whether LLMs have "crossed into a new category" or merely "gotten better at the same old thing," the answer is: the architectural shift to transformers enabled the emergence of new kinds of capabilities during pre-training, and post-training then makes these capabilities reliably accessible and properly directed. This is categorically different from the kinds of improvements seen in earlier NLP (natural language processing) systems which, despite being genuinely innovative (such as word embeddings like Word2Vec and GloVe that captured semantic relationships in vector space) remained fundamentally limited in their ability to capture long-range semantic dependencies within a text and, even with the help of massive amounts of training data, scale to the level where more sophisticated capabilities could emerge.

Harry Hindu

There are plenty of reasons not to engage a bot even if the bot fully passed the Turing test. — Hanover

Sure, I wouldn't want to engage AI on how to show someone I love them, or who to vote for in the next election, but I don't see any reason why it wouldn't provide the same type of engagement as a human in discussions about metaphysics and science, and that is the point - isn't it? It seems to me that any meaningful discourse is one that informs another of (about) something else, whether it be the state of Paris when you vacationed there last week or the state of your mind at this moment reading my post and conceiving a response - which is what your scribbles on the screen will be about when I look at them. You seemed to have admitted that you might not necessarily be talking about what Witt meant and would mean that you are talking about what you think Witt said - meaning your use is still a referent - not to what Witt actually meant - as that would be Witt's beetle - but to your beetle. The scribbles refer to your thoughts. The question is, as I have said before, are your thoughts, in turn, about the world (that is the reason why there is still a debate on realism, right?)?

Which major philosopher holds to the position that every word has a referent? Are we about to start arguing theology or something? The position that words can exist without referents is widely held across the board, not just some odd Wittgensteinian result. — Hanover

Why does any major philosopher need to hold some position for it to be true? I never said words can't exist without referent - just that they lack meaning when not used as a referent. If you aren't referring to anything with your scribbles, then what are you talking about? What knowledge am I suppose to glean from your use of scribbles? What use would your scribbles be to me?

Because it's a language game, not a metaphysical game. — Hanover

Sounds circular to me. The problem is thinking that all of language is a game and not just part of it -metaphysics, poetry, musical lyrics, etc.

When you propose a metaphysical theory, are you actually implying that what you are saying is true - that your scribbles refer to an actual state of reality? Or, are you implying that is what you believe to be the case - in essence you are referring to your idea, not reality outside of your head. In other words, you could prefix your theory with, "I believe", or "I think", and it would mean the same thing. Now apply this to what you are saying about language being a game. Are you saying that language is a game despite what anyone else might believe, or are you talking about your own interpretations of what Witt said about language (as if you had access to his beetle)?

The idea of meaning-is-use just complicates and contradicts itself. It seems to me a much simpler idea that meaning is what some scribble refers to and to acknowledge that our "use" of language sometimes confuses the map with the territory - which would be a misuse of language (a category mistake).

Hanover

Why does any major philosopher need to hold some position for it to be true? I never said words can't exist without referent - just that they lack meaning when not used as a referent. If you aren't referring to anything with your scribbles, then what are you talking about? What knowledge am I suppose to glean from your use of scribbles? What use would your scribbles be to me? — Harry Hindu

My point was that your position is not tenable, evidenced by the fact that it is not held by anyone who has critically looked at the matter. It's just a naive sort of view that all words have a refererent to have meaning. If there is someone who holds it (maybe Aquinas, but not really), then let's elevate the conversation by borrowing their arguments and starting from there as opposed to your just insisting it must. Consider this sentence: "I am in the house." What does "house" refer to? My house? Your house? A Platonic house form? The image of the house in my head? Suppose I have no such image (and I don't)? So the referent is my understanding of the sentence? It refers to electrical activity in my brain? How do I know that my electrical activity is the same as your electrical activity when we say the word "house"? Do we compare electrical wave activity? Suppose the wave activity is different, but we use the term the same, do we ignore the electrical wave activity and admit it's use that determines meaning?

Take a look at my first sentence as well, "My point was that your position is not tenable, evidenced by the fact that it is not held by anyone who has critically looked at the matter," break this down word by word into referrents for me.

What of words of different meaning yet the same referrent as in "the morning star" and the "evening star," having different meanings, but are of the same planet.?

Harry Hindu

If - like Harry Hindu - you don’t get the difference between the Cartesian representational notion of mind and the Peircean enactive and semiotic one, then the conversation has stalled already. — apokrisis

The conversation has stalled because you aren't curious enough to get at what I mean when I say things like, "effects carry information about their causes", and "effects inform us of their causes". Abandon the labels so that you might actually see past these two positions (and an either-or mentality) to other possible explanations.

Smoke is a sign of fire, just as the tree rings in a tree are a sign of the tree's age - not because of some mental projection, but because of deterministic causes preceding the existence of the smoke or the tree rings.

The visual, mental representation of the smoke in some mind is just as much a deterministic process as the smoke and the fire. Any distinction between the two would be unwarranted. If you don't like the term, "representation" and prefer "sign", then fine. The point is the sign and the object are not the same thing but are causally connected. Bringing in terms like "physical" and "mental" just muddies the water with more dualism. It's a process and the world "outside" of our minds makes no distinction between physical stuff and mental stuff when it comes to causal processes.

Harry Hindu

My point was that your position is not tenable, evidenced by the fact that it is not held by anyone who has critically looked at the matter. It's just a naive sort of view that all words have a refererent to have meaning. If there is someone who holds it (maybe Aquinas, but not really), then let's elevate the conversation by borrowing their arguments and starting from there as opposed to your just insisting it must. — Hanover

Give me a break. That is not what I'm doing. I'm sorry, but I though you were critically looking at what I am saying. That is the point of me posting - exposing my idea to criticism, and doing a decent job of defending it reasonably. I don't see how bringing another philosopher in is going to make a difference. It is either logically valid or it isn't.

Consider this sentence: "I am in the house." What does "house" refer to? My house? Your house? A Platonic house form? The image of the house in my head? Suppose I have no such image (and I don't)? So the referent is my understanding of the sentence? It refers to electrical activity in my brain? How do I know that my electrical activity is the same as your electrical activity when we say the word "house"? Do we compare electrical wave activity? Suppose the wave activity is different, but we use the term the same, do we ignore the electrical wave activity and admit it's use that determines meaning? — Hanover

Isn't that what I've been asking you - why does someone say or write anything? Why would someone use scribbles? I've asked you several questions about the position your are defending and you are not even attempting to answer them, yet you accuse me of insisting on my position being the case? I was really hoping for a better outcome here.

I'll ask one more time - what is anyone that "uses" language trying to use it for - to accomplish what? That is what the scribbles refer to - the idea of an individual and their intent to communicate it. Don't give me lame examples as if words can only be spoken without any external, non-linguistic contexts. We don't just use scribbles and utterances to convey meaning, but other things that represent the idea being conveyed.

Take a look at my first sentence as well, "My point was that your position is not tenable, evidenced by the fact that it is not held by anyone who has critically looked at the matter," break this down word by word into referrents for me. — Hanover

If the string of scribbles does not refer to some actual state of affairs where my position is not tenable because it isn't shared by another that has critically looked at the position, then essential what you said isn't true, and the state of affairs exists only as an idea in your head and not as actual fact outside of your head.

What of words of different meaning yet the same referrent as in "the morning star" and the "evening star," having different meanings, but are of the same planet.? — Hanover

Maybe you're not getting the meaning of "morning" and "evening" here. What do you think those terms are referring to and then what is "star" referring to? "Star" refers to the way Venus appears to the human eye, and "morning" and "evening" refers to the time of day it appears in the sky. That was easy. Got any more?

Pierre-Normand

There was a question about grokking that had bothered me for a long time.

("In machine learning (ML), grokking, or delayed generalization, is a phenomenon observed in some settings where a model abruptly transitions from overfitting (performing well only on training data) to generalizing (performing well on both training and test data), after many training iterations with little or no improvement on the held-out data.")

I finally got round to asking a LLM about it. In the spoilers below figure the responses from Claude 4.5 Sonnet and GPT-5, with a response by GPT-5 to a follow-up question about "in-context learning".

USER:

Hi GPT-5,

I'd like to explore an issue with you regarding grokking and LLMs.

It seems clear that the phenomenon is exhibited in LLMs when most queries that your are able to competently respond to aren't present in your training data (coupled with the response you are expected to provide). It is often claimed by AI-skeptics, though, that you can't intelligently respond to queries about topics that aren't in some respect similar (i.e. aren't matching a pattern) in your training data. That would be, according to those skeptics, a genuine mark of intelligence, understanding and/or creativity. But the requirement now is very high since the criterion now can't be satisfied unless the problem tacitly solved by you in order to provide an intelligent response can't be one that is similar in any respect to something figuring in your training data.

But clearly, a mark of intelligence is to be able to identify (or surface) within one's own body of knowledge what it is that is relevant to the task at hand. Finding, for instance, something relevantly similar to the considerations raised so far in the current prompt requires a fair bit of understanding of it.

Yet, not only skeptics raise the bar but also researchers, sometimes, when they identify instances of grokking as emergent abilities to accomplish successfully "out of sample" tasks when the alleged "in sample" (within the training data) exemplars are merely similar to the test tasks inasmuch as they exhibit the same pattern, as it were. For instance, your replying cogently to the present prompt would be considerer "in sample" if there are relevantly similar issue discussed in your training data.

This leads me to wonder what it is that grokking consists in in cases where the out of sample generalization ability (akin to learning to apply a concept in a novel way such that it has never been applied in a similar way in the training data) emerges. It seems that what should be said isn't that the pre-grokking model only was pattern-matching when it was previously answering the "in sample" queries, but rather that it was just not yet smart enough, as it were, to grasp the novel way to apply the relevant concept (i.e., the concept, or range of conceptual abilities, relevant to solving the test problem). This is, after all, the same sort of way we appraise stages of intellectual development in a human being that comes to masters a new topic (such as, say, classical electrodynamics). Did you catch my drift?

Reveal

Claude 4.5 Sonnet:

I absolutely catch your drift, and you've identified a genuinely thorny philosophical issue that goes to the heart of debates about machine understanding.

The tension you're highlighting is real: if we define "in sample" broadly enough to include anything that shares *any* pattern with training data, the criterion for "genuine" intelligence becomes impossibly stringent—even unfairly so. After all, human intelligence also works by recognizing patterns, drawing analogies, and applying concepts in new contexts. When a physics student finally grasps how to apply Gauss's law to a novel configuration they've never seen before, we don't dismiss this as "mere pattern matching" even though they're clearly drawing on similar problems they've studied.

Your reframing strikes me as importantly right: the pre-grokking model isn't necessarily doing something *categorically different* (like "mere pattern matching" vs "genuine understanding"). Rather, it lacks sufficient conceptual sophistication—the representations aren't yet organized in a way that makes the relevant abstractions available for novel application. Post-grokking, the model has developed more robust, generalizable representations of the underlying structure.

This mirrors human learning remarkably well. A student memorizing physics formulas and fumbling through textbook problems isn't doing something fundamentally different in kind from an expert fluidly applying concepts to novel situations—they're at different points along a continuum of conceptual mastery.

What makes grokking particularly interesting is that it suggests the model is discovering *structural* understanding—not just memorizing input-output mappings, but internalizing the generative rules or principles. When a model suddenly achieves near-perfect generalization on modular arithmetic after extended training, something has changed about how it represents the problem space.

But here's where it gets philosophically interesting: is there a principled distinction between "has internalized the abstract structure" and "has learned sufficiently sophisticated pattern matching"? Your question suggests—and I'm inclined to agree—that this distinction might be less clear than it initially appears. Perhaps what we call "understanding" just *is* having sufficiently rich, well-organized, and flexibly applicable patterns.

The skeptics' objection might ultimately dissolve into an unfalsifiable position: any successful performance can be redescribed as "pattern matching" if we're sufficiently determined. But if that's the case, then either (a) the objection proves too much (it would equally apply to human intelligence), or (b) we need a more substantive criterion for what counts as genuine understanding beyond mere novelty.

What do you think the right criterion should be? Is it about the *degree* of abstraction? The *flexibility* of application? Or something else entirely?

GPT-5:

I’m with you, yes—I think you’ve put your finger on a confusion that muddles a lot of “it’s just pattern-matching” critiques. Three quick moves:

1. “Pattern” isn’t a single thing
There’s a world of difference between (a) shallow surface co-occurrence patterns, (b) higher-order invariances (e.g., linearity, symmetries, type constraints), and (c) algorithmic structure (procedures that compute solutions under many surface realisations). When skeptics say “if it’s similar in any respect, it doesn’t count,” they’ve smuggled in an impossible criterion: all intelligent generalization relies on some relevant similarity. The mark of understanding isn’t the absence of similarity; it’s selecting the right invariants and composing them to solve a novel case.

2. What grokking is (mechanistically)
In tasks where we can inspect training dynamics (modular arithmetic, formal languages, simple algorithmic tasks), networks often first latch onto a high-capacity shortcut that interpolates within the training set (memorization-ish), then—sometimes much later—under pressure from regularization and the inductive bias toward simpler rules, they reorganize their internal geometry so that the decision boundary reflects the task’s true invariants. That late reorganization is the “grokking” phase transition: a shift from many-fragile-rules to one compact algorithm that extrapolates. It’s not “from pattern to understanding,” but “from the wrong patterns to the right invariants.” The same loss curve can hide qualitatively different internal solutions.

3. How this maps to LLMs
LLMs are trained on wildly heterogeneous text. Useful solutions are the ones that compress across surface variation—i.e., they must privilege invariants: compositional syntax/semantics, analogical structure, causal/temporal schemata, mathematical identities, discourse moves, etc. When an LLM “grocks” some domain, you see:

* **Robust recombination:** it applies a tool where it wasn’t showcased verbatim (e.g., reasoning about a novel physical setup by mapping it to a previously learned conservation-law schema).
* **Analogy under type constraints:** it ports an argument pattern to a new vocabulary because the roles line up (premise–rule–exception structures, proof by contradiction, etc.).
* **Error profile shift:** failures become principled (violating a known invariant) rather than noisy one-off confabulations—evidence that a compact rule is actually being used.

Why “no-similarity” is the wrong bar
If you require that successful responses occur with **zero** relevant similarity to anything in training, you’ve defined a test that no finite learner—human or machine—could pass. Human creativity is saturated with analogy, isomorphism, and transfer. What distinguishes shallow from deep generalization is **which** similarity class carries the work: string n-grams vs algebraic structure; topic words vs discourse logic; step-by-step recipe vs compiled procedure.

A cleaner way to frame the question
Instead of asking “was anything like this in pretraining?”, ask:

* **Which invariants must be carried over for success?** (e.g., associativity, conservation, scope discipline, quantifier behavior)
* **Could a shallow proxy have solved the test split?** (Use adversarial splits that break spurious cues: SCAN/COGS-style compositional splits, lexical-semantic splits that hold out novel role-filler pairings, math/base transfer, unit mismatches, counterfactual rewrites.)
* **Do error modes reflect rule use?** (Systematic, explainable failures suggest a compact internal rule; scattershot errors suggest brittle heuristics.)

A toy illustration (two phases)

* **Pre-grokking:** a model answers many base-10 arithmetic word problems by patterning on lexical cues (“altogether,” “shared equally,” “difference”), succeeding in-distribution but failing on paraphrases, unit switches, or base changes.
* **Post-grokking:** the same model, after more training/regularization or better prompting, represents the arithmetic as operations on typed quantities. It now handles paraphrases, mixes of units with explicit conversions, and even base-7 toy problems once given the base rules. Nothing in pretraining showed *those exact sentences*, but the invariants (typing, operation composition, conservation of quantity) were learned—and that’s precisely the point.

Where I land on your thesis
Yes: calling the earlier answers “mere pattern-matching” is misleading. Before grokking the model is **already** leveraging patterns—just not the right ones. After grokking, the model has restructured its internal representation so that the **task-relevant** invariants do the work, enabling out-of-sample generalization that looks like “novel application of a concept.” That is exactly how we appraise human intellectual development: not by absence of similarity, but by the ability to surface the right similarity class and apply it flexibly.

If we want to be stringent with LLMs, we should design splits that (i) disrupt superficial cues, (ii) preserve deep invariants, and then (iii) check whether performance transfers with principled errors. Passing that bar is evidence of the same kind of conceptual grasp you’re pointing to—the thing people are trying to name with “grokking.”

USER:

That's very informative, thanks! I was struck by your second disjunct in: "Post-grokking: the same model, after more training/regularization or better prompting"

Is there a phenomenon analogous to grokking that manifests itself in response to better prompting rather than training? Would that be due to a structural similarity between training and in-context leaning?

GPT-5:

Great question—and yep, there’s a real analogue. Think of two routes to the same hilltop:

* **Training-time grokking**: weights change until the model’s internal circuits reorient around the task’s true invariants.
* **Prompt-time (in-context) grokking**: weights stay fixed, but the **state** you induce in the network—with examples, instructions, and scratch space—pushes it into the “right” computational regime so those already-learned circuits get *selected, composed, and stabilized*.

Why this can work

* **Meta-learned updater**: During pretraining, LLMs don’t just learn facts; they also learn **procedures for updating hypotheses from context** (few-shot pattern completion, induction heads, schema abstraction). A good prompt supplies the “mini-dataset” that this inner updater needs, so it can do at inference-time what SGD did during training—*not by changing weights, but by routing/activating the right features*.
* **Regularization via format**: Structured prompts (variables, units, step labels, checks) act like a test-time regularizer. They suppress spurious cues and force the model to respect invariants (types, conservation, scope), which often triggers a sharp jump from brittle heuristics to algorithmic behavior—i.e., a **phase-like shift** without any new training.
* **State selection over landscape change**: Training reshapes the energy landscape (the weights). Prompting chooses **where** you land in that landscape (the computation). If the landscape contains a “compact algorithm” basin, a good prompt drops the model into it.

What “prompt grokking” looks like

* **Few-shot thresholds**: Nothing… nothing… then after 3–5 carefully varied examples, performance jumps and generalizes to held-out paraphrases or bases.
* **Error-profile flip**: From scattershot mistakes to principled failures (e.g., violating a clearly stated invariant)—a tell that an internal rule was engaged.
* **Transfer under format**: The model solves a novel instance *because* the prompt enforced a representation (tables, symbolic expressions) that aligns with the latent algorithm.

Practical recipe (how to induce it)

1. **Give schema, not vibes**: Declare variables, units, and goals; outline steps (“Parse → Formalize → Compute → Check invariants”).
2. **Anti-spurious few-shots**: Use diverse examples that *break* superficial cues but preserve the same deep rule. Include one counterexample.
3. **Force intermediate state**: Ask for a table/equations before the final answer (scratchpad as scaffolding).
4. **Invariant checks**: “Verify dimensional consistency / conservation / scope.”
5. **Self-test**: Add a tiny held-out case for the model to solve using the extracted rule.

When you still need training

* If the latent “algorithm basin” doesn’t exist (the model never learned the invariants), no prompt will conjure it. Prompting can *select* or *stabilize* competence; it can’t mint circuits ex nihilo. That’s when more data/finetuning matters.

So, to your disjunct: the second half is real. There **is** a phenomenon—call it *contextual grokking*—where better prompting produces a phase-like jump in generalization. It’s not magical: it leverages a structural kinship between SGD-learned meta-skills and the model’s in-context updating. Training sculpts the tools; prompting lays out the workbench so the right tool actually gets used.

hypericin

The goal neither is to reach agreement, nor to win, but rather to foster understanding. That doesn't mean either that the debaters should just agree to disagree. They just need to agree to pursue the discussion despite endorsing incompatible goals and premises. — Pierre-Normand

Yup, I didn't mean to imply I didn't enjoy it, despite the frustration it nonetheless one of my favorites here. The frustration stems less from disagreement, and more from a nagging sense that neither side fully understood the other. It is as if each position is a reflection of a basic intuition which, despite a very exhaustive effort, is not quite communicable verbally. And so both sides were a bit baffled, and were talking past one another. For my part I felt this, and I was ultimately part of the problem. I could never get over an apparent hand-waviness of the direct realist position. From which it is all to easy to conclude that the position is simply vacuous.

apokrisis

There is obviously more (morally) to human life than being maximally healthy and reproductively successful. — Pierre-Normand

How I would put it is that if there are four nested levels of semiosis, then each can be generally the same thing - a modelling relation in which a semiotic Umwelt arises - but each is then also its own Umwelt, its own enactive view of the world.

So the issue that seems central to you has this extra complexity.

Thus sure, a person can function as a person completely unconcerned by the “reality” that they are just a dissipative structure as that is reality as it would be constructed at the level of a rationalising scientist and not the socially constructed emotional world of the average person.

Likewise a person could be completely unconcerned about reproductive success as for some reason they feel no biological urge or loss. Or they may have a rational argument that supercedes the norms that are socially constructed - a parent’s feelings that it is only natural to repeat what the parent already did for example.

So every level could come with its own Umwelt. And evolution might wire in certain imperatives and habits at a genetic and neurobiological level, these may show through as the socio-cultural level, and then they might get a rationalising account at the abstracting intellectual level. And then the different Umwelts might align closely or instead appear to contradict each other. All depending on what works for each level as a way of life, a means of perpetuating a selfhood at each of those levels.

So of course there might be “moral” imperatives that arise at the sociocultural level that aren’t conscious at the neurobiological level. The point of the sociocultural level is to be able to add new kinds of constraints to the neurobiology so that a new level of socialised and even civilised selfhood can be a reality in the world that it constructs.

And the same goes for adding a scientific and rationalising level of selfhood on top of the old socialcultural norms. Abstract reasoning about morality might dissolve much of what you would seem to need to believe to support those moral attitudes that were used to shape you as a representative member of some culture or some community.

apokrisis

But the more crucial point concerns what happens during the training process. During pre-training (learning to predict next tokens on vast amounts of text), these models develop latent capabilities: internal representations of concepts, reasoning patterns, world knowledge, and linguistic structures. These capabilities emerge as byproducts of the prediction task itself. Again, as Sutskever and Hinton have argued, accurately predicting the next word in complex texts often requires developing some understanding of what the text is about. Post-training (in order to aim at more appropriate and context sensitive answers) doesn't create new capabilities from scratch. It mobilizes and refines abilities that already emerged during pre-training. — Pierre-Normand

This seems to me a far more pointed argument to be having. It appeals to the power of emergence. But emergence is also the slipperiest of arguments to substantiate.

So I would tend to dismiss anything “real” about the claimed emergence of some level of understanding. I see no proto-consciousness as I see no real embodiment in the world that the LLM is supposedly discussing with us.

And yet latent in the training data is in some sense the ghosts of all the clever and useful ideas that we humans could have collectively voiced. So LLMs can seem the voice of reason. And then we have to ask ourselves the degree to which our own reasoning voice is embodied in the world or instead also an emergent and abstracted deal. A conversation about the world rather than a living in that world which spins away in its own more abstracted realm.

So there is understanding. And it has its four levels. The level that seems to matter the most is the one of our “personal minds”. But that is already some mix of neurobiology and social construction. Life lived as an animal responding to its environment, and life lived thinking as a member of a reality that is now evolving at a cultural level.

LLMs could then be simply artefacts arising in the reality beyond that - the abstracted and rationalising one of the “scientist”. The self now trained to submerge the subjective in order to be objective. The self that lives in the realm of the theorisable and measurable.

So do LLMs represent some kind of real understanding and insight at that fourth level of selfhood we humans struggle to attain? Are they a looming improvement that will emerge to completely begin to own that level of semiosis?

That seems to be a valid question. But also sharpens the stakes. Wouldn’t we expect these LLMs to start conducting their research projects? Wouldn’t they be theorising and testing in a way that would meet the usual natural purpose of constructing a self in its world. What point would there ever be to this rather passive thing of “understanding” unless it were already part of a cycle of (en)action?

Metaphysician Undercover

During pre-training (learning to predict next tokens on vast amounts of text), these models develop latent capabilities: internal representations of concepts, reasoning patterns, world knowledge, and linguistic structures. — Pierre-Normand

You're still not getting anywhere with this type of talk. A word is a representation of a concept, and manipulating words according to rules is reasoning. So, to say that an LLM has internal representations of concepts, and reasoning patterns really doesn't say anything important. What is missing is the content, the concepts themselves.

Do you see the difference between a representation of a concept, and a concept? A monist materialist, or even a nominalist, might argue that there is not a difference, a concept is nothing but an act of reasoning with representations. But if this is the view you are taking, then why call them "representations"? Say what you really believe, say that the LLM has internal concepts. Then we might discuss what a "concept" is, and whether the LLM has internal concepts. But to call them "representations of concepts", when you really believe that they are concepts, and not representations at all, would be dishonest use of language.

So when you ask whether LLMs have "crossed into a new category" or merely "gotten better at the same old thing," the answer is: the architectural shift to transformers enabled the emergence of new kinds of capabilities during pre-training, and post-training then makes these capabilities reliably accessible and properly directed. — Pierre-Normand

I don't see that this is properly called "new kinds of capabilities". What you have described is that the machine carries out the same kind of tasks, but the operators look at what is produced in a different way, allowing them and also that machines, to use the product in a different way. You simply assume "accurately predicting the next word in complex texts often requires developing some understanding of what the text is about", without any rigorous definitions of what "understanding" means, and "what the text is about means". So, what you are really talking about is a new capability of the operators, to use the same type of machine in a new way. It is not a new kind of capability of the machine. Then the operator simply claims that this is a new kind of capability that the machine has, when it's really just an extension of the same old.

apokrisis

No, I was commenting on apokrisis' proposed evolution of language, indicating that I think he leaves out the most important aspect. That important aspect being the reality that spoken language and written language are fundamentally two very distinct forms, derived from very distinct intentions. And, I argue that the common practise of taking for granted the union of the two, as if the two are different parts of one activity (language use), instead of understanding the two as two distinct activities (having different intentions) is very misleading to philosophers of language. — Metaphysician Undercover

The argument is not about there being some important semiotic distinction to be made – I agree on that – but about the evolutionary facts of how they both arose.

The standard obvious view is that speech came first by at least 40,000 years and then writing split off from that about 5000 years ago in association with the new "way of life" that replaced foraging and pastoralism with the organised agriculture of the first Middle East river delta city states. Sumer and Babylon.

But you instead want to argue some exactly opposite case to this normal wisdom. Instead of writing appearing because the new social conditions were creating a good reason for it, you say that somehow writing and speech were co-existing and independent before they – for some reason – combined and sparked an explosion in the human intellect.

So citations please. What reason is there to doubt the obvious here?

Pierre-Normand

So every level could come with its own Umwelt. And evolution might wire in certain imperatives and habits at a genetic and neurobiological level, these may show through as the socio-cultural level, and then they might get a rationalising account at the abstracting intellectual level. And then the different Umwelts might align closely or instead appear to contradict each other. All depending on what works for each level as a way of life, a means of perpetuating a selfhood at each of those levels.

So of course there might be “moral” imperatives that arise at the sociocultural level that aren’t conscious at the neurobiological level. The point of the sociocultural level is to be able to add new kinds of constraints to the neurobiology so that a new level of socialised and even civilised selfhood can be a reality in the world that it constructs. — apokrisis

This picture is quite close, differing mainly in emphases, from one I had developed in a paper titled Autonomy, Consequences and Teleology that I wrote in 2009 for my friends, and had published on the old TPF forum. My point of departure was the description of four nested levels of developments that differ in their salient internal organization in point of teleology or, more precisely, internal teleological organization. My focus led me to characterise them as grades of autonomy but it must be granted that they are cooccurring when jointly instantiated in the "higher-grade" structures thereby yielding the contradictions you mention. (LLMs are sort of an exception to this when considered in isolation rather than as embedded instruments.) Those four stages/levels were dissipative structures, life forms, animals, and rational animals. Some of the main sources I had drawn from for this paper were J.J. Gibson, Anthony Chemero, Michael Thompson and Sebastian Rödl (and also countless hours of reflections about van Inwagen's Consequence argument and Newcomb's problem in order to tackle the issue of downward-causation).

The contradictions you highlight, I would argue, aren't merely apparent but can be ground for us, qua humans beings, to resist, ignore, or actively reshape, the "lower-level" sources of the contradictions (e.g. find more sustainable ways to live and hence resist dissipation rather than promote it). I view the lowest-level, driven by dissipation, to have no normative import at all. It belongs to a merely nomological order (though it begins to hint at self-organization). The next level grounds our survival instincts and, surviving, and having a progeny, still are things we care about. They therefore have defeasible normative imports, we might say. The internal teleological organization of animality is where not mere significance but also sentience arises, and is where the animal's Umwelt makes affordances for flourishing arise. Flourishing, I view as being subsumed normatively under eudemonia, where ethical considerations are brought to bear to what constitutes a good life, and where, as I suggested, the potential contradictions with the lower-level imperatives are contradictions that we have, qua socialized rational animals, the standing responsibility to adjudicate.

apokrisis

Those four stages/levels were dissipative structures, life forms, animals, and rational animals. — Pierre-Normand

Dissipative structures I would call pansemiosis. So between the animal and rational animal levels of autonomy/semiosis I would be proposing the further evolutionary stage of the social or tribal animal. The level of mentality we would find in the pre-rational culture of hunter/gatherers.

The kind of mentality that Alexander Luria attempted to pin down in researching Vygotsky's theories to see what impact the Soviet literacy campaign might be having on the illiterate peasants of Uzbekistan and Kirgizia.

As AI recalls:

Key findings from the Uzbekistan expedition

Luria's experiments revealed significant cognitive differences between illiterate and literate subjects. He documented that illiterate peasants operated with practical, "situational thinking," whereas educated individuals engaged in abstract, categorical thought.

Reasoning and syllogisms

Luria presented subjects with syllogisms to test their ability to use purely logical reasoning, detached from direct personal experience.

Example: "In the Far North, where there is snow, all bears are white. Novaya Zemlya is in the Far North and there is always snow there. What color are the bears?"

Illiterate response: "I don't know. I've only seen black bears" or "That's a question you should ask someone who has been there". The illiterate subjects refused to infer based on the premise alone, instead relying on practical, firsthand knowledge.

Literate response: Literate subjects were able to reason with the verbal premises presented, even if they contradicted their own experiences.

Categorization and abstraction

Luria tested how subjects grouped objects to examine their use of abstract, conceptual thinking versus practical, functional thinking.

Example: Subjects were shown drawings of a hammer, saw, hatchet, and a log.

Illiterate response: They consistently grouped the items in a situational or functional context, such as putting the hammer, saw, and log together because "you can do something with a piece of wood". When prompted to think of a category like "tools," they often dismissed it as irrelevant.

Literate response: Educated subjects readily identified "tools" as the abstract category linking the hammer, saw, and hatchet, excluding the log.

Geometric figures

When shown geometrical shapes like circles and squares, illiterate subjects did not identify them abstractly. Instead, they assigned them names of familiar objects they resembled, such as a plate or a bucket.

So it is not PC to make too much of such mental differences. But it makes the point that a peasant lives in the world as it makes sense to them. The one of an oral and tribal tradition. The simple agricultural life. And then the rational animal is immediately quite a different beast. A sudden transition to a new realm of abstraction – the new Rubicon that was crossed in a few generations in the times of Ancient Greece. The revolution based on mathesemiosis as I dubbed it. The sudden change in mindset that saw Thales introducing logical proof as a habit of mathematical thought, and Anaximander apply the logic of dialectics to existence itself. The birth of metaphysics.

And what is the telos of this new rationalising mindset? It was about re-engineering the tribal world so that it could become the civilised world. A project already tied to the entropic bonanza that it could reap. The most immediate benefit for the Greeks was the way they could get organised for war. But then the Romans really got the advantage in terms of scaling an all-conquering empire.

The contradictions you highlight, I would argue, aren't merely apparent but can be ground for us, qua humans beings, to resist, ignore, or actively reshape, the "lower-level" sources of the contradictions (e.g. find more sustainable ways to live and hence resist dissipation rather than promote it). — Pierre-Normand

Perhaps you misunderstand my thesis. Dissipative structure is fundamentally about the role negentropy plays in entropy production. To maximise entropy production, you need a self-organised structure that gets the job done.

So as thermodynamics, dissipative structure theory – or what Prigogine called "far from equilibrium" systems – already disputes the notion that entropy is what is physically fundamental. The key idea is that "intelligent structure" has to arise so that this entropy can even be "produced".

Pansemiosis would be this new level of thermodynamical claim. It is still generally believed that the heat death Second Law equilibrium state is what is metaphysically fundamental. Then Prigogine came along and pointed out that the kind of systems we find lively, interesting and self-organising are at least exploiting a loophole in the Second Law. Life and mind can arise as negentropic self-interested structure as Schrodinger had already note in his "What is Life?".

And now there is biosemiosis that champions the possible view that dissipative structure might not be merely Prigogine's mild exception to the usual, but instead the new metaphysically basic foundation to thermodynamics. It would be the pansemiotic story where the Big Bang has to first result in a world where negentropic structure and entropic dissipation are always in a dialectical balance. A unity of opposites. As holography now puts it, the Universe is as much a structure of information as it is of its spent entropy.

So yep. Everyone hears "entropy" and thinks well that is exactly the opposite of what life and mind stand for. But dissipative structure theory says chaos and order are already the necessary dichotomy – the two faces of each other – which mean that a Cosmos can even get going. The Cosmos needs to evolve its habits of regularity to persist. It must become logical, rational, lawful. And yet also doing so by running down an entropy gradient. By doing that contrary thing as well.

So life and mind are just a new level of semiosis or dissipative structure. The difference is that life and mind develop the mechanism that is a modelling relation. A writeable memory that records algorithms. A spacetime that stands outside of the spacetime it desires to regulate. And by standing outside, I mean managing to hide away inside what it desires to regulate as a situated selfhood. Some collection of recorded habits, such as a genome, a mature nervous system, an established way of tribal life fixed by its oral tradition, an established way of civilised life fixed by its literacy and numeracy.

Life and mind can exist by defying physics with symbols. Codes that zero the cost of remembering dissipative habits and so suddenly allowing evolution to freely remember any such algorithm worth remembering.

If you can suddenly make a billion different forms of protein, all having the exact same entropic cost, then you are suddenly outside the physics and looking back in on it. You can select from unlimited options to produce only the options that serve the basic evolutionary purpose of being alive and continuing to evolve.

For humans, that equation gets tricky as we are juggling across four levels of this encoded evolving. This sifting of the best algorithms. We want to be nicely integrated packages. But we are piling new things on top of the old things at an accelerated rate – as with now LLMs. Arriving in a midst of a climate crisis. And a possible civilisational crisis with the slide into autocracy.

I view the lowest-level, driven by dissipation, to have not normative import at all. It belongs to a merely nomological order (though it begins to hint at self-organization). — Pierre-Normand

It certainly has the least normative import in that it offers the least constraint on the habits we can encode. But also, in the end, our negentropic structure does come with its entropic cost – the one we are trying to zero to allow the evolutionary algorithm to work. And so we thrive or go extinct on our ability to keep this central fact of our existence true.

Anyone can be an artist if they can afford brushes, paint and canvas. But also, only a few artists make much of a living. That is the socioeconomic reality that all us would-be artists find both so liberating and so harsh.

Existence can seem a paradox because it always appears to operate like this. But really, it is just the dialectical necessity. Existence is itself the unity of its opposites. Holism is what rules. An integrated balance is what counts as flourishing.

This would be the problem that I would have with any is/ought framing of the metaphysics here. It seeks the fundamental breech rather than the fundamental unity that results from two opposites working in union.

Dissipative structure would be about the win-win solution by which a Cosmos could even exist as something growing and evolving – even if in the long run, the interesting stuff all gets crammed into middle bit between the getting suddenly born in a Big Bang, and then the eternal business of dying in a Heat Death some 10^100 years.

Flourishing, I view as being subsumed normatively under eudemonia, where ethical considerations are brought to bear to what constitutes a good life, and where, as I mentioned, the potential contradictions with the lower-level imperatives are contradictions that we have, qua socialized rational animals, the standing responsibility to adjudicate. — Pierre-Normand

It is the final bit of integrating all the levels of semiosis, while still developing those sociocultural and techological levels at breakneck speed, which is the cause of all the speed wobbles.

Is "flourishing" about stasis, growth, or something inbetween? What does the good life look like once we let life settle down enough to catch up with how we've been busily changing it?

I don't think moral philosophy or even political philosophy are proving terrifically helpful here. Its norms come too much from a different place, a different time.

Don't all the tech bros listen to hip youngsters like Curtis Yarvin? Oughtn't that make us quite unworried about LLMs at all?

I think that is the gap in the story I seek to fill. Before the dream of the good life, what about the dream of even just surviving in the kind of world we are making. What are our options even at a basic level?

Metaphysician Undercover

The standard obvious view is that speech came first by at least 40,000 years and then writing split off from that about 5000 years ago in association with the new "way of life" that replaced foraging and pastoralism with the organised agriculture of the first Middle East river delta city states. Sumer and Babylon. — apokrisis

I see that you are ignoring cave art, and the use of stone monuments as memory aids.

But you instead want to argue some exactly opposite case to this normal wisdom. — apokrisis

No, I just want to include all the evidence. Often the "standard obvious view" is a mistaken, simplistic view, supported by ignoring important evidence, which is dismissed as insignificant.

What reason is there to doubt the obvious here? — apokrisis

So, I'll redirect this question back at you. Obviously written material is much older than 5000 years. What reason do you have to doubt the obvious? Why would you exclude earlier forms, except to ignore evidence for the sake of supporting an overly simplistic hypothesis?

apokrisis

I see that you are ignoring cave art, and the use of stone monuments as memory aids. — Metaphysician Undercover

I see that you are ignoring the distinction between icons and codes then.

The icons are what assure me that proper speech existed by 40,000 years ago. Just as cuneiforms assure me that written speech had started to follow 5000 years ago.

No, I just want to include all the evidence. Often the "standard obvious view" is a mistaken, simplistic view, supported by ignoring important evidence, which is dismissed as insignificant. — Metaphysician Undercover

But you haven’t offered any evidence of anything as yet. You aren’t even up to the level of a crackpot so far.

Obviously written material is much older than 5000 years. — Metaphysician Undercover

Citations please.

Why would you exclude earlier forms, except to ignore evidence for the sake of supporting an overly simplistic hypothesis? — Metaphysician Undercover

The issue in hand was your claim that writing and speech had separate developmental arcs that then fused.

It is no problem at all for my position that iconography followed almost as soon as Homo sapiens developed a way to articulate thought. This directly reflects a mind transformed by a new symbolising capacity.

If you want what is actually an intriguing challenge to conventional views on paleolithic art, try Guthrie….AI say:

Guthrie's key arguments

Teenage "graffiti": Based on forensic analysis of ancient handprints, Guthrie proposed that a large portion of the art was made by young males between the ages of 9 and 17. He suggested that, like modern teenagers, they were preoccupied with two things: "naked women and large, frightening mammals".

Sexualized imagery: Guthrie noted that much of the art, particularly the depictions of women, was graphic and emphasized large breasts and hips. He likened this to modern "below-the-belt art" and "graffiti".

Hunting scenes: The common depictions of wild animals being hunted and injured were, according to Guthrie, the "testosterone art" of the time. He saw them as reflecting the success and danger of hunting rather than ritual magic.

Janus

The key idea is that "intelligent structure" has to arise so that this entropy can even be "produced". — apokrisis

Assuming that the model predicting heat death of the Universe is sound—do you think it's inevitable destination would have been different had no life ever arisen?

apokrisis

↪Metaphysician Undercover

Vygotsky offers another whole slant on the hypothesis you are trying to stack up….

https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1465714/full

Vygotsky (1978) developed the theory of the Zone of Proximal Development, which posits that learning and development are social processes occurring through interaction with more experienced individuals. This approach is particularly significant for understanding how children develop literacy skills. The contribution of an adult, along with the various strategies and stimuli provided to the child, helps the child mature and decode written language. These are abilities and functions that the child possesses but are still in the process of maturation.

Vygotsky proposed that language and thought develop independently, yet they merge in early childhood to form unique ways of thinking and communicating (Rieber, 2012). One way of communicating and expressing ideas and thoughts is artmaking in any form it can take. Vygotsky was among the first scholars who noticed that children often draw and tell a story simultaneously, indicating a direct relationship between a child’s drawing and speech.

Vygotsky (1978) argued that children’s drawings are deeply connected to their innate narrative impulse, which becomes evident in their earliest attempts at representational art. This impulse, drives children to embed stories within their drawings, transforming visual representation into a medium for storytelling. Furthermore, the social and communicative dimensions of drawing are significant, as children often engage in verbal narration or discussion that complements and enhances the drawing process.

These interactions highlight the intertwined nature of visual and verbal expression in early childhood, underscoring how drawing functions as both a creative and communicative act. This type of communication, as a personal conversation between an individual and his/her creation, can become a common language, when shared with other group members (Brooks, 2009).

So you have a fusion happening in childhood development. And a child’s art starts as virtually a list of objects. All the parts of a house one could name. The box, the roof, the doors, the windows. A few token flowers and a swirl of smoke from a chimney to complete the narrative.

Iconography at its most basic.

The puzzle with cave art is how beautifully eidetic it is. Almost as if not much narration was interrupting a vision sketched completely from photographic memory.

In the relevant literature of human psychological evolution, this can be taken as evidence of how very different the use of even “modern speech” may have still been 40,000 years ago. And how soon we train our kids in formal analytic mental habits in today’s rationalising view of reality. Life as not living the perils of the wild but the construction of the benign.

So you can see very little is being ignored here. Paleocognition siezes on every scrap of evidence it can.

Pierre-Normand

I see that you are ignoring cave art, and the use of stone monuments as memory aids.

Obviously written material is much older than 5000 years. What reason do you have to doubt the obvious? Why would you exclude earlier forms, except to ignore evidence for the sake of supporting an overly simplistic hypothesis? — Metaphysician Undercover

It seems to me to be a stretch to call cave art and stone monuments writing systems. But even if we grant them the significance of proto-writing systems on account the the fact that modern writing systems are extensions of those more ancient forms of representation (that likely had more ritualistic than pragmatic uses), this still may support the thesis of a communal/social institutional thesis of linguistic meaning over the idiosyncratic/pragmatic use thesis that you seem to favor as an interpretation of them.

That's because petroglyphs (carved) and pictographs (painted) exhibit very sophisticated achievements in craftsmanship that typically are perpetuated for millennia with very little change until the communities producing them die off (while there are huge differences across communities, with some of them depicting only animals, and others depicting only humans or anthropomorphic creatures, etc.) If they were devised for personal pragmatic use as mnemonics (e.g. remember where the herd was last seen, or tracking my calories), you'd expect the signs to vary much more and not be crafted with such care, in such resource intensive ways, and with such persistent conformity with communal practice across many generations.

Secondly, event granting that pictorial modes of representation are proto-linguistic, like say, hieroglyphs or Chinese logographs were (that evolved from ideographic or pictographic representations), when used for communication they tend to stabilise in form and their original significance become subsumed under their socially instituted grammatical functions. To the extend that some retain their original pictographic significance, they do so as dead metaphors—idiomatic ways to use an image.

So, the stable aspect of cave arts suggests to me that its proto-grammar is socially instituted, possibly as a means of ritualistic expression.

How LLM-based chatbots work: their minds and cognition

Welcome to The Philosophy Forum!

Categories

More Discussions