But the question is,
what are A and B making judgments
about?
Frege has a clear answer to that: the proposition, the thought, which is objective. I'll grant that this is basically a posit, but like any posit it serves a purpose. If A and B disagree about whether a proposition is true, they have to assign different truth-values to one and the same thing. That thing cannot be any particular inscription of the proposition, but the proposition itself.
The thought expressed by a sentence is also what Frege says you get when you understand the sentence, and you get it without remainder. It is what is communicated, what is transferred from A to B, what A and B can have different opinions about. This is the idea behind scenario (1) in which there is a single, shared, publicly available box of propositions for A and B. It's what you said it would be easy to deny. (Starting to feel a little icky about talking about propositions as if they're objects.)
So the question is whether scenario (3) can be made to work.
As is, it's just an intuition pump, right? I mean, baseball cards are manufactured; they are by design identical. The analogy is going to fail almost immediately. The questions that replace the built-in identity are a little problematic: what would make two utterances instances of the same utterance-type, two beliefs instances of that same belief-type? What's a type? It feels like you need something from scenario (1) (or nearby) to get this going.
Here's what I'm tempted to do: agree with Grice that this is what happens-- to talk about the tree, we need each to have
a belief that the object we're looking at is a tree, not
the belief. Don't posit, not yet anyway. (The idea is to avoid using Frege's machinery at all.) Accept that what we have here is all we need to talk about the tree. Then look for an explanation for how two numerically distinct beliefs can count as beliefs of the same type right here, in the transaction between two members of a linguistic community. We honestly don't need them to
be instances of the same type, not for this part, although it's pretty obvious why that would be helpful. Right now all we need is for A and B to agree to treat their numerically distinct beliefs as instances of a belief-type.
Grice is almost certainly going to get here with a (probably infinite) chain of intentions, so that can get a little weird.
I'd like to come at it sideways, by the comparison with phonemes. How does someone "decide" that the allophone you actually utter will count as a /d/? This is already a little wrong, because the range of allophones is itself already determined by the speech community. It still looks like we're trying to figure out how conventions work.
One shot at this might be this: when you utter a sound, I have to take it as an allophone of some phoneme we use in our speech community or not. If possible, I'll take it as one of ours, because (a) intentions, and (b) why not? You can provisionally, experimentally take the sound as a phoneme. Which one? Again, you have to decide whether that phoneme with the others around it make a morpheme, and again if possible you will, because (a) intentions, and (b) why not? You do that provisionally and experimentally, all the way up to the complete utterance, and see if it seems to work. I'd say there's a tiny bit of evidence we do this in the way we read over typos, mentally substituting the right letter because we're pushing toward taking the utterance as valid. You could think of this as the principle of charity, but you might also wonder what choice we have but to proceed this way.
Does this actually work? Has any of Frege's machinery been smuggled in here anywhere?