Mathematical Conundrum or Not? Number Six

Pierre-Normand

A normal distribution does not have to have a mid of 0, nor do they need negative values. — Jeremiah

I did not say that it has to be centered on zero. Normal distributions are unbounded on both sides, however. They assign positive probability densities to all real values.

Jeremiah

↪Pierre-Normand

It'll be fine once we use a MCMC and get the HDI. You forgot the part of my argument where we need a sample distribution.

Pierre-Normand

It'll be fine once we use a MCMC and get the HDI. — Jeremiah

Yes, everything will be fine and dandy. How did I not think of that...

Jeremiah

↪Pierre-Normand

The HDI is 95% of the area under the curve. It is a cut off range.

Jeremiah

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introbayes_sect009.htm

Highest Posterior Density (HPD) Interval

Jeremiah

My continue interest here is not in debating the different approaches, but in the modeling of X. If I was going to build a theoretical Baysian model I would use a normal distribution as a prior since when dealing with an unknown population of this sort it would be the most robust. This would include a prior mu and SD.

Srap Tasmaner

↪Jeremiah

It sounds to me like you're trying to figure out what would be a good prior for what amounts to "Pick a number." I mean, you could do research, see what people tend to pick and in what range, but what can you do with just math here?

Jeremiah

↪Srap Tasmaner

I was thinking you could use Y to justify your priors. It is the only information you have. Maybe include justification on the emprical rule which says 68% of all observations will be within one SD of the mean; 95% within 2 SD and 99% within 3 SD. If I am remembering that right.

Jeremiah

I have to think about it more.

Jeremiah

Here is a link for the emprical rule, which I think makes a normal distribution a better choice for a prior. https://www.investopedia.com/terms/e/empirical-rule.asp

Jeremiah

I am not advocating for this approach but I think this is your best shot at estimating the median of the distribution from a single sample.

People have been assuming a prior that fits their expectations the best; however, if you are gonna model off such assumptions you need to at least try to mitigate observational bias, which is why I think you should use a normal distribution. The point is to lessen the impact of false assumptions, and due to robustness of the normal distribution, I feel this makes it a better choice. Also because of the empirical rule one can better estimate the relative position of $Y$ on the distribution and the normal distribution has a common occurrence in nature. For all these reasons I feel it is a good choice for a prior.

If you assume a normal distribution (already transformed by chance mechanism number 2) as the prior then when you see Y you could say by the empirical rule:

$P(\sigma+\mu \geq Y \geq \mu) = 34\%$
$P(\sigma-\mu \leq Y \leq \mu) = 34\%$
$P(2\sigma+\mu \geq Y \gt \sigma+\mu) = 13.5\%$
$P(2\sigma-\mu \leq Y \lt \sigma-\mu) = 13.5\%$
$P(3\sigma+\mu \geq Y \gt 2\sigma+\mu) = 2.4\%$
$P(3\sigma-\mu \leq Y \lt 2\sigma-\mu) = 2.4\%$

Here is a picture.

Now you just need an estimator for $\sigma$ .

Srap Tasmaner

When I was looking for a way to describe "success", I picked the average value as a cutoff; more than that is success. The curious thing is that there is a whole range of values between k and 2k that would work just as well, would pick out "having the larger envelope" in exactly the same cases. (And we could also talk about which other values of k they would work for.) That interval is bounded at the one end by k and at the other by 2k. In a way, the "arbitrary cutoff" switching strategy is built into the problem, because the goal of that strategy is to land on a value somewhere in that interval. Any such value would work as a criterion for switching because it would also work as a criterion of success -- that is for not switching.

On the one hand, my description was too general, a description of the case where the higher and lower values are separated by an arbitrary interval rather than a prescribed one. Since that interval is determined, there is this "bucket" effect, where every amount above k is normalized as, or equivalent to, 2k. And it works in the other direction too. Anyway, there is this pressure here to empty the space of all values except k and 2k: it's impossible to be less than k or more than 2k. But in between there's conflict: On the one hand k wants to treat every value greater than itself as 2k; 2k wants to treat every value less than itself as k, which means they cannot agree on how to treat the interval between them. The only way to maintain consistency is to empty it completely, so the space becomes as discrete as it possibly can while presenting two choices: it narrows to a set of 2.

It occurred to me when trying to pick a criterion for success, that the other envelope is it -- and that creates some weirdness about what picking is -- but I shrugged that off on the grounds that knowing its value would merely allow you to deduce the average value. But there's an element of truth to the idea. And I want to say there is something strangely not number-like, or very minimally number-like, going on. In effect, there are only the two elements, and an order defined on the set that says one is greater than the other. But that turns out to be arbitrary. Could be "red" and "blue", or "heads" and "tails", or "0" and "1", so long as one of them designated as better (and one worse).

When I was trying to think of analogies for the "criterion of success" argument I thought of things like picking red and blue marbles, but labeling them in a language you don't know. With my playing cards, there is a standard order which I just tossed out: instead of saying you need to get the high card, you're supposed to get whatever I say. And I want now to say there is something in the problem that encourages this, because when you see the value of your choice you have to make lots of assumptions even to guess whether that's a good value or a bad, much less know it's good or bad. Numbers don't usually act like that. We usually have some context for saying whether a number is big or small. (Why I had to toss the standard order for playing cards.)

All of which makes doing calculations of any kind quite strange, because there is so much about the problem that makes it impossible to treat what are manifestly numbers as numbers. Once you have complete knowledge, everything becomes normal again and you can say "10 < 20" or "10 > 5". But with anything less than complete knowledge, the game acts like something else entirely.

Just some musing on waking up in the middle of the night. Does this make any sense to anyone else?

Jeremiah

I actually have been thinking about how to model this problem and I believe I will need to include a logical true/false vector.

The problem here is that at the first chance mechanism $x_{i,1}$ is an independent observation. However, when you get to the second chance mechanism, when the envelopes are filled, this could be thought of as another function which transforms the distribution into one where the values have a relationship. It is not as simple as trying to determine where on the distribution I am, as I also have to account for that relationship.

Jeremiah

My simulation can give a visualization of this.The resulting distribution from the process of filling the envelopes changes the possible outcomes from an independent event to possible outcomes that have a relationship.

Using my simulation from before for an example,

Here is a plot of the values after they have been put into envelopes :

https://ibb.co/nGsEOz

Now for comparison:

Here is the selection before the envelopes are filled.

https://ibb.co/m0uuOz

The second chance mechanism transforms the distribution into one where now we have a relationship to account for.

Jeremiah

It may need two logical vectors.

Jeremiah

There is a problem with the third chance mechanism, the one where we get the chance to switch, and if it should be included in the model, as it is subjective, the criteria used will vary from person to person. I could treat it as an arbitrary do/don't switch logical vector, but that would not really reflect the subjectivity inherited in that final step. It may be better to cut the model off at the point when the envelope is handed to the player.

There are actually two random variables in the OP, which may need two models.

JeffJo

I am not so sure about that. — Pierre-Normand

And I'm sure I intended to have a "not" in there somewhere. I'll fix it.

JeffJo

"A random variable is defined by a real world function"

That's a bit like saying that a geometrical circle is defined by a real world cheese wheel. — Pierre-Normand

I'm not sure what "real world" has to do with anything. But...

Probability theory does not tell us how to define outcomes. The outcomes of a coin toss could be called {"Heads", "Tails", "Edge"} or {"Win", "Lose", "Draw"}. But you can't use those in an expectation calculation, can you?

So measure-theoretic Probability Theory requires a way to express outcomes with numbers. So its strict definition of a "random variable" is a function whose argument is an outcome, in whatever form you choose to use, but whose result is a number. So G() might be the function for your gain for a bet on Heads, so G("Heads")=1, G("Tails")=-1, and G("Edge")=0.

Personally, I prefer to call the qualities I use to describe outcomes "random variables." You can still think of them as functions whose arguments are abstract outcomes,whether or not the results are numbers. Especially in problems like the OP, where there is no need to be so formal, and no "real world" significance whatsoever.

JeffJo

This is the part I'm still struggling with a bit. — Srap Tasmaner

I think this illustrates the issue you are struggling with:

Say you have a perfectly-balanced cube with the numbers "1" thru "6" painted ion the sides. If you roll it, what are chances that an odd, or even, number ends up on top? Answer: 50% each.
Say you have an unquantifiable blob of plastic, with an indeterminate set of numbers painted in apparently random places. If you roll it, what are same chances? Answer: "I don't know."

In the second case, if you had to bet on "odd" or "even," you might flip a coin and so have a 50:50 chance of either. That's as good an option as it gets. But that doesn't mean you expect the wager be fair.

There are still exactly two possibilities, but that doesn't mean the chances are the same for each. The Principle of Indifference requires that you make some assessment about the equivalence of the outcomes.

If you look in your envelope and see $10, the question in the OP requires you to make an assessment about expectation. You don't have any information that allows you to do so. And it makes no sense to talk about a "prior" - which actually refers to something else - in this case. The only point in doing so, is if you have the means to update it.

JeffJo

A normal distribution does not have to have a mid of 0, nor do they need negative values. — Jeremiah

A normal distribution refers to a random variable whose range is (-inf,inf), and is continuous. The first cannot apply to the TEP, and the second is impractical.

Jeremiah

↪JeffJo

Doesn't matter, I am not using it as proxy.

Jeremiah

The normal distribution is widely used to make probabilistic claims about unknown populations, which have discrete values and unknown limits. If you want to know how this magical process works, pick up an introductory statistics book.

*Edit, sometimes I use my phone, and it just turns out a mess.

Jeremiah

Also even without a sample distribution a theoretical model can still be set up. That is where my interests are atm. However, sigma is still a problem when going the Bayesian route.

Pierre-Normand

I'm not sure what "real world" has to do with anything. But... — JeffJo

It has to do with the sorts of inferences that are warranted on the ground of the assumption that the player still "doesn't know" whether her envelope is or isn't the largest one whatever value she finds in it. And this, in turn, depends on how "doesn't know" is meant to be interpreted. Is that meant to imply that the the player is entitled to apply the principle of indifference and therefore assign a 50% probability (exactly) to each one of the two possibilities irrespective of the value that she finds in her envelope? This is what I would take to entail that the prior distribution is uniform, and not to be consistent with a prior belief that the amount of money in the universe is finite.

When I say that the prior distribution is uniform, I mean this to represent the player's prior expectation (or credence) that whatever real positive value v it is that she will find in her envelope, it remains equally likely, from her point of view, that the other envelope might contain v/2 or 2*v. This would also entail that the prior expectation that the player would find a value v in her first envelope that is lower than some upper bound M, however large M might be, is infinitesimal. That such uniform and unbounded prior expectations don't apply to rational agents being faced with "real world" problems is what I meant.

JeffJo

Also even without a sample distribution a theoretical model can still be set up. — Jeremiah

The sample distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n. Since we have no such sampling, let alone a statistic, there is no sample distribution, or use for one. Period. This just isn't a statistics problem.

A theoretical model of the probability problem can be set up, just not as a statistics problem. But doing so does not address the OP, where we have no information that would allow us to set one up.

The only things we can say about the OP are:

If you don't look in the envelope, the only valid solutions mentioned in this thread consider one of three functionally equivalent random variables: the smaller value X, the difference D which is equal to X, or the total of the two envelopes T which is equal to 3X.
- A pedant would insist you need to include one probability from the probability distribution of whichever you choose. But it divides out so it isn't necessary in practice.
- The answer is that the expected value of your envelope is (x)/2 + (2x)/2 = 3x/2, and the other is (2x)/2 + (x)/2 = 3x/2. So switching changes nothing.
If you look and see value v, you need two probabilities from that distribution: Pr(X=x/2) and Pr(X=x).
- These values are not only completely unknown, they ...
  - ... are beyond the scope of Bayesian Inference.
  - ... are beyond the scope of sampling.
  - ... are beyond the scope of anything anybody can contribute here.
  - ... have no "sigma."
- The only point in mentioning them, is that the expectation calculation requires both.
- The expectation for the other envelope is v*[Pr(X=x/2)/2 + 2*Pr(X=2x)]/[Pr(X=x/2) + Pr(X=2x)].
- This is 5v/4 if, and only if, we know that Pr(X=x/2) = Pr(X=2x).
- This must be greater than v for at least one value of x.
- This must be less than v for at least one value of x.
- The expectation of this formula, over the range of V, is the same as the expectation of v over that range. See "If you don't look ...".

JeffJo

It has to do with the sorts of inferences that are warranted on the ground of the assumption that the player still "doesn't know" whether her envelope is or isn't the largest one whatever value she finds in it. — Pierre-Normand

And since the OP does not include information relating to this, it does not reside in this "real world."

Pierre-Normand

And since the OP does not include information relating to this, it does not reside in this "real world." — JeffJo

That's fine with me. In that case, one must be open to embracing both horns of the dilemma, and realize that there being an unconditional expectation 1.25v for switching, whatever value v one might find in the first envelope, isn't logically inconsistent with there being an unconditional expectation 1.25w for sticking, whatever value w one might be shown to be in the other envelope (as @Srap Tasmaner had suggested, by way of reductio of the unconditional switching strategy). The situation would therefore be analogous to the one that I illustrated with my Hilbert's Infinite Hotel thought experiment.

Pierre-Normand

A pedant would insist you need to include one probability from the probability distribution of whichever you choose. But it divides out so it isn't necessary in practice. — JeffJo

Edited: I had posted an objection that doesn't apply to what you said since I overlooked that you were only here considering the case where both envelopes remain sealed. I agree with your post.

Jeremiah

I am not talking about a model of the probability, I am talking about modeling the problem. We build models for two reasons, to make a predictions or to understand relationships. By working though the model we come to a better understanding.

Andrewk had a completely valid point about things being ill-defined and one that has never been fully addressed. I don't agree with the methods so far that have been used to try and map this out.

This is not well-defined. — andrewk

Let me give you an example of what I mean.

The OP has two random variables that have yet to be defined, $X$ and $Y$ .

Since $X$ is undefined, for the moment I will stick with my previous definition. Although I am not fully satisfied it.

For $Y$ maybe something like this:

~~$y_i = \beta_0 +I_1\beta_1A_i+I_2\beta_2B_i+\varepsilon_{ij}$~~

This would just be an example of what I mean, I am still hammering it out. Something to think about during the dull points in my day.

I will work at defining everything as long as my interest hold. Mapping everything out is good practice and it helps one to understand everything much better.

Jeremiah

The sample distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size n. Since we have no such sampling, let alone a statistic, there is no sample distribution, or use for one. Period. This just isn't a statistics problem. — JeffJo

How does this relate to what I am doing? As far as I can tell it doesn't.

Mathematical Conundrum or Not? Number Six

Welcome to The Philosophy Forum!

Categories

More Discussions