How nature innovates - napkin calcs#

March 2024

#evolution

Introduction#

In Arrival of the Fittest, Andreas Wagner suggests that robustness facilitates evolutionary innovation (discovery of novel function). His lab looks at innovations in metabolism, regulation, and macromolecules and ask what common principles of innovation exist. This includes questions like how nature: preserves what works while exploring the new? discovers solutions in such a large solution space?

This notebook explores these claims and ideas.

How long should it take for bacteria to innovate to survive penicillin?#

This question helps us understand whether nature is intelligently exploring a solution space or if the structure of that space simply allows effective random exploration. Specifically, we want to calculate the probability that a population will stumble upon mutation X purely by chance.

In Arrival of the Fittest, Andreas Wagner presents a fresh perspective on evolutionary innovation: robustness—the capacity of biological systems to withstand change—actually facilitates the discovery of novel functions. By conceptualizing genes, proteins, and metabolic pathways as vast “genotype networks,” Wagner shows how populations can drift neutrally through these networks, preserving existing functions while uncovering new adaptive opportunities. This framework explains how complexity and diversity emerge naturally in biological systems without requiring improbable one-step mutations.

Expected Time for Random Discovery#

  1. Space size
    $\( S = 10^{X} \)$

  2. New variants per generation
    $\( V = N_{\mathrm{cells}} \times \mu \)$

  3. Probability of success per generation
    $\( p = \frac{V}{S} \)$

  4. Expected generations to discovery
    $\( E[G] = \frac{1}{p} = \frac{S}{V} \)$

  5. Expected time (hours)
    $\( E[T_{\mathrm{hrs}}] = E[G] \times t_{\mathrm{gen}} \)$

  6. Expected time (years)
    $\( E[T_{\mathrm{yrs}}] = \frac{E[T_{\mathrm{hrs}}]}{24 \times 365} \)$

import math

# ------------------------------------------
# PARAMETERS
# ------------------------------------------
X_log10 = 130         # log10 of space size (e.g., proteins ~20^100 ≈10^130)
N_cells = 1e9         # number of bacteria in the culture
mu = 5e-3             # mutation rate per genome per generation
t_gen_hours = 0.5     # time per generation in hours (≈30 minutes)

# ------------------------------------------
# CALCULATIONS
# ------------------------------------------
# 1. Calc solution space size: S = 10^X_log10
S = 10 ** X_log10

# 2. Calc new variants per generation: V = N_cells * mu
V = N_cells * mu

# 3. Prob of success per generation: p = V / S
p = V / S

# 4. Expected num of generations to discovery: E_gen = 1 / p
E_gen = 1 / p

# 5. Expected time (hours): E_time_hours = E_gen * t_gen_hours
E_time_hours = E_gen * t_gen_hours

# 6. And in years that is...: E_time_years = E_time_hours / (24*365)
E_time_years = E_time_hours / (24 * 365)

# ------------------------------------------
# OUTPUT RESULTS
# ------------------------------------------
print(f"So, the expected number of generations to hit one specific sequence in a 10^{X_log10}-sized space is ≈ {E_gen:.2e} generations.")
print(f"So, the expected time is ≈ {E_time_hours:.2e} hours ({E_time_years:.2e} years).")
So, the expected number of generations to hit one specific sequence in a 10^130-sized space is ≈ 2.00e+123 generations.
So, the expected time is ≈ 1.00e+123 hours (1.14e+119 years).

Conclusion:
Blind sampling of a \(10^{130}\)-size space is essentially impossible.
Evolution must exploit local structure paths—rather than a pure random search.

Wagner talks about considering strings of amino acids 100 letters long with

  • 10^130 proteins (library of short proteins)

  • 10^700 regulatory circuits

  • 10^1000 metabolism

How is possibility space structured (such that random search can find solutions)?#

We know from experimental evolution—often of bacteria—that novel functions can emerge remarkably quickly. To explain this, the underlying “possibility space” (or genotype space) must be richly structured. Andreas Wagner calls this structure genotype networks: vast, interconnected sets of genotypes that all map to the same phenotype.

  1. Synonymous texts as an analogy

    • Imagine every possible amino‐acid sequence as a “text.” Two such texts—a plant globin and an insect globin—both transport O₂, yet differ by roughly 90 % of their residues. Despite this divergence, they fold into virtually identical 3D shapes and bind oxygen in the same way.

    • All sequences that perform the same function form a “neighborhood” in sequence space. Within this neighborhood, single mutations—or a few steps—lead from one functional sequence to another without losing the phenotype.

  2. Sprawling genotype networks

    • Walking through this neighborhood, one can move in many directions while preserving function. Such connectivity means that a population can accumulate mutations, drift across the network, and stumble upon adjacent regions that encode new or improved functions—yet never stray into nonfunctional territory.

  3. Robustness enables innovation

    • Robustness (the ability of a system to tolerate mutations without phenotypic change) creates these expansive networks.

    • Without robustness, single mutations would more often be deleterious, collapsing exploration. With robustness, populations can “scour” large regions of genotype space, making random search surprisingly effective.

  4. Generalizing across biological “libraries”

    • Wagner argues the same principles apply not only to proteins but to metabolic pathways, regulatory circuits, and macromolecular assemblies. In each case, functional diversity arises from richly interconnected genotype networks.

    • Core takeaway: Robust genotype networks—and the diverse neighborhoods they create—are both necessary and sufficient for biological innovation.

  5. Implications for engineered systems

    • Like nature’s Lego principle, a small set of building blocks plus simple linking rules can yield an immense repertoire of functional assemblies.

    • Minimal complexity and modularity aren’t obstacles to innovation; they are enablers.

Conclusion:
Nature’s “libraries” are organized so that most functions have many synonymous solutions. Random mutation—what organisms do best—can then efficiently explore these networks, conserving existing functions while discovering new ones.


Sources:

  • Andreas Wagner, Arrival of the Fittest (2014)

  • “Arrival of the Fittest” talk at the Royal Institution, 13 November 2014 (YouTube)