Nick's Auto-Evo Algorithm Discussion

Sentiant · October 17, 2020, 11:23am

Its probably most efficient to use the existing population algorithm, since we’d need to know the populations anyway, regardless of whether they’re the sole motivator for auto-evo. Just run the algorithm for a number of timesteos until you’re reasonably sure an equilibrium in population must have been reached. Then, within the last (couple) timestep(s), find the fraction of the total cells that died, and the fraction of the total cells that reproduced. Because of the way probabilities work, these values are approximately equal to the probability of their respective events happening.

Let’s say the R = fraction of cells that reproduced, D = fraction of cells that died. Making the small abstraction that cells can only reproduce or die (or do neither) in a single timestep, not both; we can find all the ways a cell can reproduce before it dies:

It reproduces immediately, in the first timestep: probability of this happening = R
On the first step it neither reproduces nor dies, in the second it reproduces: probability = R * (1 - R - D)
In the first and second step it neither reproduces nor dies, in the third it reproduces: probability = R * (1 - R - D)^2

Etc.

As you may have noticed this is a geometric series (Geometric series - Wikipedia) with -1 < r < 1. It goes on into infinity, but it has a finite sum: R / (1 - (1 - R - D)).

This sum equals the chance of a cell reproducing before it dies. Of course, I am banking on the fact that the steps are short enough that not a lot of cells are dying and reproducing in the same timestep.

hhyyrylainen · October 17, 2020, 11:32am

We currently run 10 timesteps, as it turns out it is very hard to make a stable model. Even in the simple prototypes it seems that there becomes an oscillation between different species very easily. So I kinda doubt we’ll end up with a nice algorithm that converges to final population numbers.

If you just derive the numbers like that, won’t it be exactly the same effect than comparing (percentage wise) how much the population increased?

Sentiant · October 17, 2020, 12:52pm

If we can’t get an equilibrium you can still use the last couple timesteps to get an idea of the average of the oscillation.

And yeah, I just realised too: there is no value for R and D where using the probability gives a different outcome auto-evo wise than using the percentage-based population growth (despite them being different numbers). That said the probability is still the proof that percentage-based growth is more accurate than just plain populations.
It also has some other interesting properties. For instance, a probability is always between 0 and 1, so you *= it by ten and give the player an x/10 successfulness score for each species. (Or by 5 to give x/5 stars, or by 100 to give a percentage, you get the point).

hhyyrylainen · October 17, 2020, 5:11pm

I said percentage based population growth, because the starting population of the species is same for all mutation calculations, which means that the highest percentage population gain, is also the highest total population.

So after this chain of deductions, the highest ending population for a mutation check, is actually the best, and the current implementation is fine.

Sentiant · October 20, 2020, 2:00pm

So we’ve established that population is probably an okay metric to use, but I’m still not entirely satisfied. Am I really meant to believe that every multicellular creature ever failed to get ‘being smaller’ as one of its five auto-evo options every, single, time? Is any species whose population is not in the trillions just the result of unfortunate rng?

I don’t think so. I think auto-evo can only properly create species with lower populations if we make a change even more fundamental to the system than a different metric for evaluation. The problem isn’t that it doesn’t know how to pick the ‘best one’ out of its options, the problem is that it is picking the ‘best one’ at all. Let me explain:

Contrary to what catchphrases like ‘survival of the fittest’ might tell you, evolution doesn’t really select for the optimal species. Rather, it selects for the fit enough, anything that doesn’t die before it reproduces. Multicellular creatures, for instance, are doing worse than bacteria, they just aren’t going extinct either.
To further illustrate this, let’s take a look at a scenario that would be common in the auto-evo. A species of microbes develops five different mutations. Auto-evo runs for all five mutations and for the original species, and checks which one has the most population. The variant that does the best is then selected, and the entire species magically gains that mutation!
In reality, this would never happen. Instead, all six variants of the species would exist concurrently. Then, the ones that don’t make it will go extinct, and the other ones will continue to exist. This means that any species that isn’t heading for extinction will continue to exist, no matter how suboptimal they are.

If we tried to make the auto-evo system account for this, it might look something like this:
Step 1: Make 5 different mutations, as before.
Step 2: In the population algorithm, replace a small fraction (say 1% per mutation) of the species with the mutated variant.
Step 3: Run the population algorithm to find out which variants die off (probably because they couldn’t compete with the original), and which ones don’t. The variants that die get yeeted, the other ones become separate species for the next round.

Now, I have the sneaking suspicion that this system will cause a ton of tiny populations to exist. This could be fixed by moving the threshold slightly above zero, to make those tiny populations count as extinct.

hhyyrylainen · October 20, 2020, 7:37pm

With really small populations, this is just going to get rounded out.
While the overall steps look like they can work, I don’t really get this point.

How long do you run the simulation for? There has to be an arbitrary cutoff at some point.

Just for fun I calculated that to run through all of the generations, we’d need to run a cool 36.5 billion steps, which is the reason auto-evo is abstracted.

I already added that as the prototype algorithm ended up almost infinitely carrying around species with just a couple population.

Sentiant · October 21, 2020, 8:48am

I didn’t really put a whole lot of thought into that number. The point here is that it has to be small, since species heading for extinction will need to get there quickly instead of having five million population to use as a buffer.

From reading Nick’s thread on the dev forums I was under the impression that most species would die long before that point. Unsuccessful species in that thread appear to die before ~20 months. That seems like a good cut-off point.

Sentiant · January 13, 2021, 3:34pm

Double post because it’s been 3 months and this is about something different.

I was rereading the Auto-Evo thread on the dev forum and found a little error in the math. I know Nick has been in hibernation for the past year but this post will be here when he (hopefully) comes back.

The Distance per Hunt isn’t calculated correctly. In his equation, Nick assumes that adding more creatures would decrease the average distance to a food source, but this is incorrect. Since each creature has to get his calories individually, it does not actually matter how many other creatures there are. If you look at the picture with the triceratopses you’ll find that the distance from a triceratops to a fern is still the same regardless of how many triceratopses there are, even if there was only one triceratops. Only the size of the patch and the amount of ferns matter. This is likely part of the reason why time per hunt is so low right now.

Let’s look at the equation Nick gives and how to correct it:

Distance per Hunt = (0.66 * Patch Length) * (1 - (4 * Predator Population Density * Prey Population Density))

Nick expresses doubt about this, but he is right about the (0.66 * Patch Length) part. The rest of the equation just looks like guesswork to me. I have to be honest here: I don’t know how to mathematically prove anything about this either, but I managed to brute force this by using a simulation:

Distance = (0.6617 * Patch Length) / (Food Sources^0.35)

I have verified with the simulation that this is roughly accurate for anywhere from 1 to 5000 food sources.