Stochastic processes are sequences of random variables.
Random variables are summaries of stuff that happens; all the baby making in the world produces a total number of babies per year. The hows and whys of reproduction matter little to the summary "total number of babies", because the hows and ways are specified by an interaction with the population of babies.
If that seems abstract, it is because it is. More specifically, the context of the random variable "total number of babies" is specified implicitly there. We can ask "what are the total number of babies born in (category X)?". Above it is "the entire world now", but "(category X)" could be "In America", "per American state", "per continent", "between the years 1995 and 2013 globally".
Random variables can be generated through their interaction with different time points; each time point gets a set of "hows and whys" associated with it; these are the events that drive the formation of the summary. So if we take the sequence of years {1991,1992,1993}, there are random variables "total number of babies born in (category X) at (year)". This is called a (discrete time) stochastic process. An example then would be:
{Total number of babies born in America in 1991, total number of babies born in America in 1992, total number of babies born in America in 1993}.
There (category X) is "America" and year is 1991, 1992 or 1993.
A stochastic process is called "first order Markovian" if the behaviour (in terms of probability) of the a time point depends only on its immediately precedent time point. Total births, there, would be first order Markovian if "The total number of babies born in 1993 depends only on the total number of babies born in 1992" (and the same for the other time points). Prosaically, this gets called "memorylessness"; (first order) Markovian processes are memoryless because they only care about what immediately precedes them. Like how on a pool table the balls don't care how they got to where they are, only where they are.
The billiard balls there are like the plants forgetting the volcano's rage in the above story; they can forget the eruption but not the soil nutrient distribution. The volcanic soil nutrient distribution
mediates the relationship of the jungle growth to the volcano; once the relationship with the volcanic soil is set up, the volcano's rage is forgotten. Mathematically how this causal property gets represented is in terms of the connectivity of graphs, and the time points above are a sort of graph. How you can travel from one node to another in a graph determines the causal relationships of things indexed to each node.
Let
be the random variable that maps a year to the total number of babies born in America on that year. Then the graph:
has the first order Markov property. If you follow the arrows, you can only get to
from
through . History is forgotten in the process
because it embeds all of its relevant information in every time step.
What this looks like in terms of the jungle-volcano system in the story above is:
where E is a placeholder for "mean canopy cover of species X" or "total biomass of moss over area X" or some other indexed summary of the jungle, eruption or soil. EG:
volcanic ash composition and distribution -> soil nutrient composition and distribution -> growth rate of trees over space
You cannot 'access' the effects of the eruption in the past on the jungle growth except for the soil formation (this is not strictly true in general of course, and the one directional arrows between soil formation and jungle growth are actually reciprocally dependent from the perspective of the whole system). And the jungle's growth patterns really do 'forget' the eruption by
only interfacing with the eruption's effects on soil nutrients.
The problem here is of course that considered from the perspective of the jungle-volcano system over time frames that the volcano will erupt in, soil formation and jungle growth are absolutely linked; the potential to accumulate plant biomass more rapidly in the direction of increasing soil nutrients is only there because of the volcano (in reality it can be more related to wide-spread seed dispersal mechanisms from animals' shit, but this won't change the increased fertility I am trying to highlight). From the view of the entire system we see an oscillating series of destruction and regeneration. From the view of the plants
within it, they see the soil and how they interact with it!
The eruption as a "cause" of volcanic soil factors out, all that matters is the volcanic soil for the jungle in how it behaves. The jungle has learned not to care about the volcano because its eruption is usually
off on most timescales, and when it is
off there is great soil. When it is on, however, the plants nearby are destroyed.
There is no relevant information nature cannot access, nature unfolds according to its own sense of relevance, but its
sub-processes learn to contextualise. Perhaps it could even be phrased like the origin of sub-process
is a context of development. Like the canopy trees never become immune to lava. Causal histories get absorbed into intermediaries until they
become relevant again.
From the framing device of the story, we know that the jungle growing near the volcano waits in bated breath; happy to grow how it does until it is destroyed by its own ignorance. The internalised processes of jungle growth in the jungle-volcano system do not anticipate this; at least insofar as they
cannot interface with the destruction the volcano brings.
Some ways of growing, however, can interface with such destruction and profit from it (next post later).
Citation describing first order Markov models in a molecular evolutionary context. The relevant thing to look for in here is how expanding the 'state space' (available information which is incorporated to process dynamics) can reduce the dependence on the unobserved past (unavailable information that is implicitly unincorporated).
Citation for causal graphs.
Citation for how intermediaries causally isolate nodes.