Let me give you some simple examples of theorems taken from here:
http://www.cs.ru.nl/~freek/100/ to illustrate my idea of "probability measure" for a theorem.
1. Pythagorean Theorem
- Theorem expressed in a formal language: (Theorem number 4 from
https://madiot.fr/coq100/)
Theorem Pythagore :
forall A B C : PO,
orthogonal (vec A B) (vec A C) <->
Rsqr (distance B C) = Rsqr (distance A B) + Rsqr (distance A C) :>R.
From a formal point of view, it's a logical equivalence between orthogonality of two vectors and an equation between real numbers (vectors' lengths).
Should an equivalence of this kind be surprising? (or improbable)? The answer is NO: in fact, whatever angle you take between two vectors, there will be an induced equation between vectors' lengths, and vice-versa.
So, why do you find the theorem surprising when you read it for the first time if you didn't know it before? (if you understand it)
Because it's a coincidence between two special cases:
- orthogonality is a special case of angle between vectors (a very particular one: the length of the projection of one vector on the other is zero)
- the sum of squares is a special case of equation: it's very simple, and made only of primitive operations addition and squaring.
So, could a computer program recognize this theorem as surprising (or improbable)? Yes, but only if you give to the program a model where the coincidence between the two special cases occur.
A model could be of course a computer-vision software plus the ability to paint triangles and observe them, but could even be some much simpler linear algebra software.
The essential feature here is to recognize zero as a special case of real number and addition and squaring as special cases of mathematical functions.
And that is easily achievable if you have an axiomatization of real numbers and you measure the complexity of a number or function simply by using the length of their definition (taken as Shannon information measure)
In other words, both terms of the logical equivalence have a very small complexity (or information measure) if interpreted taking as model an axiomatization of real numbers.
==========================================================================
2. The Irrationality of the Square Root of 2
- Theorem expressed in a formal language: (Theorem number 1 from
https://madiot.fr/coq100/)
Theorem sqrt2_not_rational : forall p q, q <> 0 -> p * p <> q * q * 2.
From a formal point of view, it says that a particular equation (p * p = q * q * 2) has no solution on the domain of natural numbers.
Should an equivalence of this kind be surprising? (or improbable)? The answer is NO: infact, most of diophantine equations have no solutions.
So, why do we find the theorem surprising?
Because square root of two is the measure of the square of side one, and every measure in the physical world is represented by a fraction (because measuring is a finite physical process: you can never find square root of two as a result of a measure in physics).
Then, if we give a computer a "physical" model of geometry (for example a world made of pixels), it could recognize, experimenting with it, that every length can be measured, and every measure is in fact a fraction. So, it could recognize as "surprising" (or statistically very improbable) a theorem that says that there is a measure that is not a fraction.
==========================================================================
3. Fundamental Theorem of Algebra
- Theorem expressed in a formal language: (Theorem number 2 from
http://www.cse.unsw.edu.au/~kleing/top100/ - COQ version is very long if I include the necessary definitions)
lemma fundamental_theorem_of_algebra:
assumes nc: "¬ constant (poly p)"
shows "∃z::complex. poly p z = 0"
From a formal point of view, it says that every equation of a certain type (non constant polynomial) has always a solution on a certain domain (complex numbers).
Why should this be an exceptional case in comparison with a lot of other types of equations that always have solutions on certain domains?
First of all, a polynomial is a function that has a very simple definition (in the sense of low complexity): a polynomial is whatever function you can build using only "times" and "plus" operations with one variable. This fact alone makes it an interesting object.
A simple concept (as for complexity of it's definition) related to every function is it's inverse. So you easily discover that not all polynomials are invertible.
The surprising thing (that makes the theory more interesting), as we know, is the discovery of complex numbers: if you only take one polynomial that has no solution (the simplest one: "X * X = -1") and "pretend" that it had a solution by adding a new symbol "i" to the number system, then you discover that now all polynomials have a solution.
Complex numbers then become even more interesting when you discover that they correspond exactly to a SIMPLE geometric model (other coincidence).
Would a computer be able to "discover" complex numbers? I don't think so (at least not using the technologies currently used for artificial intelligence). But I think that a computer could be able to RECOGNIZE that complex numbers are an interesting concept by using a measure of how "improbable" are theorems when interpreted on the right models (meaning: the models that people have in mind when they say that they "understand" the theorem).