I see what you mean here. If at any given time the only variable that really is 'varying' in the system is the hidden state, then we can appropriately talk about a direct causal relationship. Like triggering a pinball, the various flippers and pegs are going to be determinate of it's path, but they're fixed, so right now it's path is directly caused by the strength of the trigger? — Isaac
@Banno (because "seeing as" and "seeing an aspect")
Yeah! That's a good analogy. Translating it back to make sure we're concordant: the priors=flippers, task parameters =pegs and the strength of the trigger = hidden states.
So, if we want to answer the question "what are people modelling?" I think the only answer can be 'hidden states', if they were any less than that then the whole inference model wouldn't make any sense. No-one 'models' and apple - it's already an apple.
I think what I claimed is a bit stronger, it isn't just that the hidden state variables act as a sufficient cause for perceptual features to form (given task parameters and priors), I was also claiming that the
value of the hidden states acts as a sufficient cause for the
content of those formed perceptual features. So if I touch something at 100 degrees celcius (hidden state value), it will feel hot (content of perceptual feature).
I think a thesis like that is required for perception to be representational in some regard. Firstly the process of perceptual feature formation has to represent hidden states
in some way, and in order for the perceptual features it forms to be fit for purpose representations of the hidden states,
whatever means of representation has to link the hidden state values with the perceptual feature content. If generically/ceteris paribus there failed to be a relationship between the hidden states and perceptual features with that character, perception wouldn't be a pragmatic modelling process.
I'd agree here. Do you recall our conversation about how the two pathways of perception interact - the 'what' and the 'how' of active inference? I think there's a necessary link between the two, but not at an individual neurological level, rather at a cultural sociological level. All object recognition is culturally mediated to an extent, but that cultural categorising is limited - it has functional constraints. So whilst I don't see anything ontological in hidden states which draws a line between the rabbit and the bit of sky next to it, an object recognition model which treated that particular combination of states as a single object simply wouldn't work, it would be impossible to keep track of it. In that sense, I agree that properties of the hidden sates have (given our biological and cultural practices) constrained the choices of public model formation. — Isaac
Just to recap, I understand that paragraph was written in the context of delineating the role language plays in perceptual feature formation. I'll try and rephrase what you wrote in that context, see if I'm keeping up.
Let's take showing someone a picture of a duck. Even if they hadn't seen anything like a duck before, they would be able to demarcate the duck from whatever background it was on and would see roughly the same features; they'd see the wing bits, the bill, the long neck etc. That can be thought of splitting up patterns of (visual?) stimuli into chunks regardless of whether the chunks are named, interpreted, felt about etc. The evidence for that comes in two parts: firstly that the parts of the brain that it is known do abstract language stuff activate later than the object recognition parts that chunk the sensory stimuli up in the first place, and secondly that it would be such an inefficient strategy to require the brain have a unique "duck" category in order to recognise the duck as a distinct feature of the picture. IE, it is implausible that
seeing a duck as a duck is required to see the object in the picture that others would see as the duck.
Basically, because the dorsal pathways activities in object manipulation etc will eventually constrain the ventral pathways choices in object recognition, but there isn't (as far as we know) a neurological mechanism for them to do so at the time (ie in a single perception event).
I think we have to be quite careful here, whatever process creates perceptual features has the formed perceptual features that we have in them - like ducks, and faces. I know the face example, so I'll talk about that. When someone looks at something and sees a static image or a stable object, that's actually produced by constant eye movement and some inferential averaging over what comes into the eyes. When someone sees an image as a whole, they first need to explore it with their eyes. Eyes fixate on salient components of the image in what's called a
fixation point, and move between them with a long eye movement called a saccade. When someone forms a fixation point on a particular part of the image, that part of the image is elicited in more detail and for longer - it has lots of
fovea time allocated to it. Even during a fixation event, constant tiny eye movements called
microsaccades are made for various purposes. When you put an eye tracker on someone and measure their fixations and saccades over a face it looks something like this:
(Middle plot is a
heat map of fixation time over an image, right plot has fixations as the large purple bits and the purple lines are saccades)
But what we
see is (roughly) a
continuously unchanging image of a face. Different information sources
*(fixation points, jitter around them)
of different quality
of different hidden states
*(light reflected from different facial locations of different colours, shininess)
being aggregated together into a (roughly) unitary, time stable object. Approximate constancy emerging from radical variation.
That indicates that the elicited data is averaged and modelled somehow, and what we see -
the picture - emerges from that ludicrously complicated series of hidden state data (and priors + task parameters). But what is the duration of a perceptual event of seeing such a face? If it were quicker than it takes to form a brief fixation on the image, we wouldn't see the whole face. Similarly, people
forage the face picture for what is
expected to be informative new content based on what fixations they've already made - eg if someone sees one eye, they look for another and maybe pass over the nose. So it seems the time period the model is
updating, eliciting and promoting new actions in is sufficiently short that it does so within fixations. But that makes the aggregate perceptual feature of
the face no longer neatly correspond to a single "global state"/global update of the model - because from before it is updating at least some parts of it during brief fixations, and the information content of brief fixations are a component part of the aggregate perceptual feature of someone's face.
Notice that within the model update within a fixation,
salience is already a generative factor for new eye movements. Someone fixates on an eye and
looks toward where another prominent facial feature is expected to be. Salience strongly influences that sense of "prominence", and it's interwoven with the categorisation of the stimulus
as a face - the eyes move toward where a "facial feature" would be.
What that establishes is that salience and ongoing categorisation of sensory stimuli are highly influential in promoting actions
during the environmental exploration that generates the stable features of our perception.
So it seems that the temporal ordering of dorsal and ventral signals doesn't block the influence of salience and categorisation on promoting exploratory actions; and if they are ordered in that manner
within a single update step, that ordering does not necessarily transfer to an ordering on those signal types within a single
perceptual event - there can be feedback between them if there are multiple update steps, and feedforwards from previous update steps which indeed have had such cultural influences.
The extent to which language use influences the emerging perceptual landscape will be at least the extent to which language use modifies and shapes the salience and categorisation components that inform the promotion of exploratory behaviours. What goes into that promotion need not be accrued within the perceptual event or a single model update. That dependence on prior and task parameters leaves a lot of room for language use (and other cultural effects) to play a strong role in shaping the emergence of perceptual features.