Hyper meta recursion and some thoughts on genetic program encoding
This article deals with maths while not containing any advanced maths so it’s definitely safe even with a basic knowledge. It basically builds up to a short thought that struck me one day, which I hope you’ll find interesting.
When it comes to describing systems, recursion is often an easy and intuitive way to describe a flow of states
F(t+1) = something . F(t)
Where “something .” is some kind of operator that applies “Rules” to the previous state.
The Rules could be anything that you define, like for instance some rules flowing from the free energy principle. The recursion is what matters here, not the details of those rules, so don’t click that link if you’re not interested. Forget about the rules.
Also for now let’s disregard what t+1, t, t-1, … means, it’s not important for gaining new insight at this stage. If you’re thinking of reality you might consider taking “Planck durations”, or seconds, or meters, depending on what you’re doing.
There are many other ways of modelling systems, that don’t use recursion. One of the successful ways to model really complex stuff is defining an integral or differential equation, and finding a solution for it in a function f that you can then use to predict what happens for any value of t.
Now, this is often what happens if you try this and get unlucky:
You end up with a system that is described with a lot of characters. There is a lot of language needed to describe the differential equation, let alone to describe a solution of it— ironically, often you often require recursive techniques to solve it numerically rather than symbolically. Remember this for later.
🐇 all the way down
With recursion, it’s indeed rabbits all the way down though: to understand F(t+1) you need to understand F(t), which means you need to understand F(t-1), which means…
So how do you gain any understanding?
Nature doesn’t have a real problem with this, as F(t) doesn’t need to be understood: it’s right there for the taking, it just is. This is a happy circumstance in reality, things are what they is.
Yet, this is different for a machine, where it needs to conjure F(t) out of a bunch of empty transistor states! In a purely recursive relation, it would need to reconstruct all the states to get to the one it wants.
The trade-off for this complexity however is succinctness: in some cases the rules “something” can be described a lot easier with recursion, than without it. This is important.
The nature of reality is such that it is often difficult to say anything else about what’s happening, apart from this rule of recursion.
In the simulated world, we can more often immediately jump to any point in the sim based on some calculated rules. For instance, in a video game, it is possible to calculate the exact point a bullet will hit a moving character or get intercepted by an object, before any gun is even shot. Without ever needing to simulate any or all of the in-between steps.
In reality, this is harder to do. A seagull may come fly in out of nowhere, or the gun may backfire, quantum mechanics gets a word with you, etc...
Note that in a video game as well, you can get fooled by this “ease” of teleporting through time using simple formulae, and have a bullet fly through a wall because you never hit it, in case you just follow the stepwise rules.
Often to simulate reality accurately you need a hybrid solution that combines the formulae with some analysis of the continuity between stages.
Let’s go back to our recursive relationship
F(t+1) = something . F(t)
This doesn’t necessarily mean F(0) is some kind of Big Bang beginning. I guess it might be that way if you start getting serious about it, but if you want to get all cosmological and serious like that, you’ll find out sometimes F(t-m) can pop up in recursive definitions, as systems that were independent due to their distance (and the limit of information speed wrt “c”) can come back together after a while, and start affecting each other with old things.
There’s a deeper question here on a “non-locality” of Nature there, but it’s interesting enough on its own.
I guess with that in mind Nature also has systems that are better described with sequences as
F(t+1) = something . (F(t), F(t-m), …)
This becomes some kind of semi-continuous infinite vector soup, instead of a discrete list of states to look back to, but the premise is the same.
Note that this is equivalent to the original recursion in a way, it’s just that the rules “something” can look different using this split of old and new. Often it could be a way to make them a lot shorter.
You can this way successfully use the locality of Nature to reduce the rules to what you need to describe the local events. “something .” becomes simpler, by understanding what it can operate on in terms of previous states.
You can define more complex systems with less symbols by breaking up the recursion this way.
Also, as a side-thought, if you manage to split up the rules together with its parameters, you can use parallel computation to run over the recursive states. In a way, the boundaries of information speed serve as a sort of thread safety boundary for Nature, that way.
I’m not sure how quantum entanglement fits in there — guess the phenomenon could be seen as mutex locking the system and imposing some limit of operation to nature similar to Amdahl’s Law.
Now, why am I focusing on the amount of characters you need to describe the “something .”? It’s in the title, I’m getting there!
If you have a more limited set of characters to describe a system, or to encode it, then you can start mutating them and taking genetic optimisations, finding the combination that performs the best according to some fitness function. They are more suitable for genetic programming techniques.
This is a lot slower and more difficult to do if you need tons of signs to characterise a system. This is somewhat what happens through our DNA, which I will get back to in a moment.
Likewise, the amount of “recursive programs” that you can build with f.i. 40 bytes of information, can be counted. And all of these recursive programs will share some of the same structure, modelling a state that depends on a previous state. It’s quite a powerful idea that the model of recursion constrains the diversity of your population of programs.
The limited amount of characters to describe a recursive flow can create a staggering complexity in systems. For instance, the Mandelbrot fractal is defined using the recursive formula:
Where z is a complex number, with z_0 for instance being given by the (x,y) coordinate in an grid. c is the value at the start of the iteration process
“something” is very simple here, you take the square of the last state and add the complex number you started with. That’s all the rules it has.
If you execute this on an area, and track all the points that eventually escape to infinity, for each step of the recursion, you get the Buddhabrot graph, which looks like this (I coloured different ‘depths’ of it in different R,G,B channels to look more cosmical, but didn’t add anything else)
This entire complex thing, is generated from that one recursive formula, nothing else!
Several mysteries about discrete numbers emerge as you let it whir, resulting in this strange fractal behaviour — look for instance at the small Buddha appearing above the larger Buddha. It’s a marvel.
You can easily spend a lifetime studying just this one graphic, trying to unravel its patterns. I’d suggest you don’t do that as you’ll miss out on better things.
However, there’s an even deeper level of complexity to consider, with less grandeur than majestic fractals, but with a lot more insight into how reality works, and especially in how life can be so extremely complex in its small wiggle room.
Let’s go back to our analogies in reality. Consider for instance the fact that DNA is processed in a cell, producing stuff, but the processing of DNA also creates blocks of biological matter that do further DNA processing, which can create new entities that do the same, by reading different parts of the DNA that created them in the first place — or even partially the same parts.
To describe such a complexity abstractly, you could describe it as
F(t+1) = F(F(t))
I skipped on the “rules” here to avoid confusion, they’d basically be interwoven in between the function calls if you need them for understanding what I mean.
This is what I guess you could call a “meta-recursion”: the state is not only determined by the previous state, but the previous state actually determines how much state is “looked back” to model the next state. This recursion can become more “meta”, as in the following:
F(t+1) = F(F(…(F(F(t)))…))
Obviously this will only work if the thing that F accepts also looks like what F outputs. Which is pretty analogous with what you have in nature: old states become the inputs of new states.
One thing that is important and useful to consider is what it really means to perform a function iteratively this way.
Let’s use an example, it’s a famous one from Douglas Hofstadter, called the Hofstadter H sequence
Now, I invite you to think about the first couple of terms before your mind blows up, just to get a “feel” of the build-up of the meta-recursion.
Let’s plot how this function looks like for the first thousands of terms:
It looks extremely ordinary, doesn’t it? It just looks like a straight line, with a slant of about 0.7 — it appears this slant is determined by the root of x³ + x -1 by the way.
But, if you zoom into the graph, you’ll notice little squiggly lines!
Let’s see how much this plot differs from a straight line.
This complex behaviour emerges from a system that doesn’t really have a lot to it: it says to apply a function three times to the previous number, and take the difference with the current number. In a way, it’s even more simple than the Mandelbrot case because it doesn’t involve any multiplications or rules about complex numbers. The something rules are very simple.
One interesting thing we can do now is also play with the amount of meta-recursions.
In our original case it is 3.
H gets called three times upon itself. We could, if we define a new operator, define that as:
H(n) = n - H➜3(n-1)
Which means we apply the function H three times on itself.
What happens if we for instance try all
H(n) = n - H➜M(n-1), M = 3 .. 29
Where M goes from 3 to 29?
Well, you get the following graphs, if you look at the difference of the points with a straight line (which is a sort of a measure of the “squiggliness” of the behaviour).
The “periods” are definitely interesting here, and obviously if you zoom into the different parts, there’s more structure revealed. Just for the record, this shows the difference between the diagonal line, and the trend of the line. That’s why they all come back in harmony again at the end.
Let’s plot this a little more interestingly, by using the y axis as the meta-recursiveness of the function. This shows a bigger detail and it’s actually quite an interesting visualisation if you look through the numbers. The pattern becomes fairly repetitive but in the bottom area, with low meta recursion, there’s a lot of chaos. When the depth of the meta-recursion reaches a certain phase it seems (I did not calculate for insanely large values) the function somewhat “changes” character.
Alternating meta recursiveness
What if we alternate this between two values for the recursion?
That is to say, if n is even, we have H operating a number of times on itself, and if n is odd, it operates a different number of times on itself.
I wanted to see the emerging behaviour, and this is it, for different values of the “even” and “odd” parts.
Note that there are some “stable” configurations and some more chaotic ones. I can’t explain this and I’m not sure if anyone can. Recurrence relations are notoriously difficult to crack, and are related to some age-old number problems. Please appreciate for a moment here that we have not yet figured out how many divisors a number has.
I’ll get back to this “brink of chaos” later, by the way.
Just consider what we did here. We took a very limited amount of characters to describe a system, and it already showed chaotic behaviour. The only thing we did now is change a small part of it and we create a family of stable and even more chaotic (with even less structure to the period growth) from it. However, all of these systems still have similar qualities: there’s a net growth, which stays beneath a certain limit — which is logical considering we subtract a positive value from n.
Consider what this means for genetic programming. Changing a few parameters results in radically different behaviour, but there are still stable configurations. This sounds like something that can rather quickly evolve under a fitness function, and it’s interesting how a part of the underlying structure stays constant throughout the mutations, with elegance.
Hyper meta mega recursion
Now, this deserves its own chapter so I made one for it.
Let’s consider what happens if we let the amount of recursions be determined by the last state. So we get:
F(n+1) = F➜M(n) , where M = F(n)
So, the size of the chain of F’s is determined by a lower order of F itself (which again, uses a chain of itself, please don’t forget that F is defined by the recurrence relation and nothing else).
Now, while this is cool, and I guess you can call it hyper-recursion, it’s not hyper mega cool.
There’s something you can do that makes this more ‘recursive’, because it does feel like you’re grabbing M somewhat out of thin air now.
Can you guess it?
I will call the following “hyper meta mega recursion”, because I watched a lot of Dragon Ball Z as a kid and I think it’s a cool name. Also, it has a lot resemblance to hyperoperators.
In the spirit of hyperoperators, you could create a new operator:
F➜➜(m+1)(n+1) = F➜(F➜(m)(n+1))(n)
where F➜(0)(n+1) = F(n)
Now this should get your brain flaps working. Basically the amount of function calls, the length of the snake of “(..)” brackets, is determined by the result of another hyper mega recursion, of a lower order. The lowest order is then given by a recursive relationship with the function.
This allows you to create operators like F➜➜➜m(n), where the lowest order would be the result of another hyperoperator.
Again, we should not forget that F is only defined by this recurrence relationship so it’s rabbits upon rabbits down the hole.
To be frank, there’s a couple of ways to look at hyper-recursion, and this is just one of them, and I don’t believe it is necessarily the cleanest one. Perhaps I’ll revisit, but it doesn’t really matter for the point I will try to make later.
Let’s adopt this technique on the H-sequence, and see where it goes. So let’s try to plot
H(n) = n- H➜➜3(n-1), with H(0) = 0
It’s not hyper interesting, it just shows a period to it where it either jumps too far back, or follows the line y = x. The “angle” of the graph is somewhat interesting but might be trivially explained.
I also could not find any interesting anomalies with going into higher-order hyper meta mega recursions, except if I change the initial conditions. Still, it’s an object worth discovering, especially with different rules. I put off publishing the article to see if there’s anything more interesting about it, but don’t have the time now to go deeper into it.
Now, why do any of this? Why even think about hyper meta mega recursion? This is exactly what my girlfriend said when I explained what I was thinking about..
Well, what I wanted to think about is Deep Learning, and how the individual neurons are actually quite simple, the bodies of their behaviour are at least often considered to be simple and algebraic. Still, amazing things are donein life with only a few neurons of which we think we understand a lot of how they work individually.
It is the recursive structure around the neurons that often creates complex behaviour, something we mimic by creating deep learning systems that converge into non-linear models that performs certain functions well.
Likewise, in Gödel Escher Bach links are made between the topic of recursion and the concept of consciousness. I lent out my last copy so I can’t check if he did anything with this type of hyper meta mega recursion.
What I wanted to think about, and adventure in just a little, is how, with a very small set of neurons, you can create recursive functions that map to wildly complex behaviour. In the end, what a hyper meta mega recursion really means is that there is some dependency between the neuron’s output and whether or not — and when — it will become its own input again.
With that in mind, genetically mutating those neural networks and finding an optimum in the resulting population might become a tangible way of training them, rather than using stochastic gradient descent. Just because there’s so few neurons.
It’s funny to relate this with some existing philosophies that the mind is constantly operating on the brink of information chaos, and it’s right on this brink where something like consciousness can arise. Perhaps the analogy is stretching it, but I think it’s an interesting one.
Now, it has to be said there are relatively few standard models for neural networks right now that work like this. Recurrent networks exist but I have not seen them in the way described. Perhaps I will as I get the opportunity to study more of them.
I hope if you’re interested in the topic you’ll find new ways to think about the simplicity of encoding recursive models, and the shared attributes in their family of hyper meta recursions. The structure of a unit processing information is obviously important, but the surrounding structure of information flow is sometimes neglected to be responsible for a lot of the complexities in life, and being an important facet of evolution itself.