I don't disagree with anything you wrote. However, contemporary versions of direct realism, intentionality theories, and phenomenological theories all explain the same phenomena. Each of these have their own problems, but it doesn't seem readily apparent that some have significantly worse problems than others. The result is that I would tend to say that "indirect realism can be made consistent with the empirical sciences," rather than "the empirical sciences confirm indirect realism," which would seem to imply that we can eliminate competing theories based on the empirical sciences.
Obviously, not all formulations of indirect realism are consistent with the empirical sciences. Older versions tend to work on the assumption of substance dualism, fall into Ryle's regress, and involve Cartesian humonculi or Cartesian theaters. The formulations of indirect realism that are consistent with the sciences are just those that have been tweaked and edited until they conform to the empirical sciences, which is the same thing that can be said for direct realism and other theories.
Unfortunately, a great deal of the literature on the objects of perception spends its time attacking the strawman of "naive realism." Yet showing that this naive realism cannot be the case doesn't really show that one's favored theory is more plausible than any other contemporary competing theory. Plus, it's unclear if such "naive realism," was ever embraced. The pre-Socratics already have formulations akin to indirect realism, direct realism, and phenominalism in key respects, so the "naive" position seems to be more a phantasm than something that must be torn down.
It still seems to me that significant critiques can be leveled at indirect realism as well. There are the adverbial critiques. There are the accusations of falling into a form of (crypto)-substance dualism with "mental representations" acting as ontological entities and "agents," only being able to view mental representations.
Here are the big problems:
Concepts - The notion of a "concept" is notoriously muddy. It not obvious that my conceptual understanding of a concept like "Hegelian dialectical," or "Marxism," is the same sort of thing as the way in which my visual cortex organizes sensory input into the experience of "seeing a flower." The first exists (only?) in recursive self-awareness and can be articulated to other people via words. The second seems impossible to even get into recursive self-awareness, let alone communicate. Neuroscience cannot proceed by my describing how it is I use these unconscious processes to turn visual input into the image of a flower, nor can I communicate how I achieve it. I am unaware of these "concepts." Further, the second sort of concept seems "necessary," for the cognitive acts that give rise to the first. I cannot come up with an articulation of what flowers are if my sensory system cannot distinguish them. Lower animals certainly have the second type of concept, but it seems doubtful they have the first.
Indeed, I am only really aware that I am using the second type of concepts when I begin to suffer from agnosia or have a stroke, etc. And even then, the experiences that people who suffer from these ailments describe is one of absence, they are not able to diagnose themselves. Whereas if I forget what "Hegelian dialectical," is, I am aware of this inability to recall or the fuzzyness of the concept. Nor does it seem like I have a "concept" of every particular shade of green, yellow, and brown I see when I look at my lawn in the same way that I have a concept of "the United States." So, to the extent that
some forms of indirect realism make their claims about anthropology and perception by conflating these two notions of the word "concept," they seem to be open to attack. And note that the brain areas that appear to be involved in both notions of the term "concept," appear to be quite different as well.
Phenomenological Inseparability - This leads into another problem, that of the defining feature of indirect realism, the claim that "we experience mental representations." The problem here is well summarized in the Routledge Contemporary Introduction to Phenomenology, which comes up with a comical list of excerpts of philosophers and scientists trying to describe phenomenal awareness
without reference to the things being experienced. These invariably degenerate into just describing the things being experienced, "the taste of coffee," or "the red of a balloon floating in my room," or else become unintelligible nonsense like "I am perceiving hotly," and "I am smelling bitterly."
The point intentionalists (and some direct realists) make here is that there seems to be absolutely no daylight between the perception and the objects perceived. We seem perfectly able to communicate our experiences to one another in some ways, but it becomes impossible to do so if we focus on the perception side of "perceiving representations," by themselves. It leads to incoherence. And, so they argue, this shows that there is no distinct ontological entity that might be called a "mental representation," that is experienced by a "perceiver who perceives them." Nor is there really good empirical reasons to divorce the two. Where does neuroscience say representation occurs versus the perception of representation? It doesn't say anything about this. It has yet to articulate how this works, but tends to conclude there is no Cartesian theater and that perception and representation are at least not distinct at the level of neuroanatomy (fine grained analysis is indeterminate on this issue).
(This seems like a good argument in favor of intentionalists)
Superveniance Relations - Finally, we can consider direct realists' objection, which I think might be the best one. This relies on notions in superveniance. Superveniance cannot just be defined as "no difference in A (mental phenomena) without a difference in B (physical phenomena)." This turns out to be a wholly inadequate way to frame superveniance.
Such a definition allows, in global superveniance, that a world where Mars has one more molecule of dust can have completely different mental properties from the world without the extra molecule of dust. There
is a physical difference between the worlds, so there can be as much mental difference as we like. The same is true for local superveniance. If Sally 2 has one more magnesium atom in her body than Sally 1, she can now have totally different mental properties (we can place the atom in the brain and the same problem remains)
People have tried to fix this with the idea of P-regions and B-minimal properties. P-regions are just those regions of space time that are absolutely essential to the mental phenomena being considered. B-minimal properties are just those physical properties needed to ensure the mental phenomena in question.
If might be thought that these concepts wouldn't cause problems for indirect realism. After all, for any freeze frame microsecond of perception, we can assume that the relevant P Region is entirely in the brain. Does this not support the assertion that perception must just be "of" things in the brain, representations?
The problem comes when you want to analyze any perception that actually takes a meaningful amount of time. All of the sudden, things outside the body become part of the P Region. If we would not have seen the apple but for the apple being on the table, then the apple, or at least part of it or something with similar B-minimal properties, is required to explain the mental state.
So now the direct realist (along with all the externalists) will say: "hey, the superveniance relationship for perception
has to involve the object, it is a
necessary physical constituent of perception." Which, while not proving their point, still seems to make it more plausible. If the B-minimal properties of the object perceived cannot be changed one iota without changing the mental experience, then it seems like there is a very "direct" connection between the object and the perception. There is, in this case, no change in the mental representation without a change in the B-minimal properties of the object, and it seems that the "directness" of this relationship is exactly the sort of thing the direct realist is talking about.
Recall, Aristotle (and Aquinas) don't have us perceiving the entire form of an object. Nor do they have us perceiving the form "as it is in itself." This would require our heads turning into apples or something when we see an apple. Rather, a part of the substantial form is directly communicated to sensation. And here, the B-minimal properties of the object that precisely specify the part of subjective experience corresponding to that object, seems like a very good candidate for the parts of the object's "form/intelligibility" that are directly communicated. This relation is direct in that there can be no change in A without a change in B, and because B is B-minimal, no change in B without a change in A. This is a one to one relationship between part of sensation and an external object — what Aristotle wants to communicate even though he is certainly no naive realist. A lot of Catholic philosophers work with this sort of realism, and have enhanced it with semiotic explanations but unfortunately they reside in a bit of a bubble.
(Note, I would think this was a KILLER argument for direct realism BUT for the fact that P Regions and B-minimal properties actually seem to destroy superveniance by making what is considered a relevant physical element
dependant on the qualities of mental experience - but that's a whole different thread lol)
Of course, there are similar problems with the other theories. I just wanted to illustrate that theories all have significant problems AND can be made consistent enough with empirical evidence that none of particularly "confirmed" above others