Day 5 - Evolution of Learning in Brains and Machines - Esteban Real and Yigit Demirag

Friday's talk took us on a tour of evolution of learning and comparing learning in brains vs machines. Sketches are provided by Stan Kerstjens.

Esteban Real and Yigit Demirag

(Google DeepMind)

Evolving Intelligence and Algorithmic Discovery


Sketch by Stan Kerstjens

Imagine a remote collection of islands somewhere out in the ocean. These islands are inhabited by very simple beings, all of whom are constantly performing very simple tasks. They walk around the island and name the things they see. If they see a coconut, they have to say "coconut!" If they see a boat on the horizon, they have to say "boat!" You get the idea. 

If they do well, they get to have lots of offspring. Their children are very similar to them, with slight variations resulting from mutation. It is probably obvious that this thought experiment is meant to emulate evolution. 

Experimental Outline: Digital Evolution in Data Centers

Now, to make the metaphor concrete and outline how it is implemented experimentally in the AutoML-Zero framework (Real et al., 2020): This little universe is a datacenter inhabited by ~10 million programs, where the islands represent individual machines in the cluster. 

The setup of this experiment was chosen to investigate the conditions for learning to emerge from large scale evolutionary search.

The lifecycle is as follows: programs compete based on task performance during short "lifetimes" (e.g., 100ms), where well-performing programs get to replicate themselves via cloning with mutations. Population size is managed by removing the oldest members of each island.

The Anatomy of an Evolving Program

Each program comprises three core, initially empty component functions: 

  • Setup 
  • Predict
  • Learn    

operating on shared memory.  

Esteban explained fitness evaluation, typically using supervised tasks (e.g., CIFAR-10). This involves predict-learn cycles on images (coconut, boat etc.) during training, followed by evaluating `Predict` accuracy alone on validation data.  

Mutations drive the evolution, randomly inserting/deleting basic mathematical instructions or altering arguments. The key is that no gradient concepts are typically provided a priori.  

                    I'll leave the functioning of this program as a little puzzle for motivated readers.

The Emergence of Learning

A central insight of the AutoML-Zero framework relates task unpredictability to the evolution of the `Learn` function:

  • Single Task: Evolution often hardcodes effective `Predict` functions, no learning required.  
  • Multiple/Unpredictable Tasks: Programs developing functional `Learn` routines gain a significant selective advantage.  

So only under unpredictable environmental conditions does the Learn function need to be populated. The conclusion is that learning evolves as an adaptive strategy when environments demand more than fixed responses.

Discovering Algorithms: From SGD to Lion

Yigit then went into details on specific algorithmic discoveries. The AutoML-Zero work demonstrated the rediscovery of algorithms like gradient descent and even backpropagation for simple neural nets from scratch.  

When evolved directly on tasks like CIFAR-10 variants, techniques like bilinear interactions, gradient normalization, and weight averaging emerged.

He then shifted focus to the Lion optimizer, detailed in Chen et al. (2023).

Unlike the pure AutoML-Zero approach, this search didn't start entirely from scratch but was "warm-started" with AdamW, letting evolution radically simplify and modify it. Lion’s resulting characteristics are memory savings (only tracking momentum) and its use of the `sign` function for updates.

Population Dynamics and Design Philosophy

Esteban returned to discuss the benefits of the "island model" structure for balancing exploration and exploitation in the evolutionary search. As the amount of population mixing between islands could be varied, best individuals emerged when migration was allowed at a limited rate, somewhere between the 2 extremes of homogeneous mixing and complete island isolation.

Another point was the spectrum between 2 approaches for attaining AI models : contrasting human-engineered algorithms (often prioritizing interpretability) with potentially higher-performing, less interpretable evolved solutions. He positioned the AutoML-Zero approach as one that reduces human bias in the search space, a la Bitter Lesson

Cost, Future Potential, and Adaptive Robotics

Addressing the computational cost, Esteban suggested it might decrease over time and noted the potential for amortization, where discovery costs are offset by deployment.  

Yigit concluded with teasers, including work on adaptive robotics by Kelly et al. (2023). He described the AutoRobotics-Zero (ARZ) framework, an adaptation of AutoML-Zero for robotics. Using ARZ, interpretable, symbolic policies were evolved for a simulated LaikaGo quadruped that could adapt instantly to sudden "leg breaking". These ARZ policies outperform standard MLP and LSTM baselines and are significantly simpler. 


Vijay Balasubramanian

Vijay started by acknowledging the common focus on backpropagation in large transformer architectures in AI, emphasizing that learning in the brain operates quite differently. He laid out a landscape of mechanisms involved in biological learning:

  • Local Plasticity Rules: Beyond the classic Hebbian rule ("fire together, wire together"), he mentioned Spike-Timing Dependent Plasticity (STDP) where the precise timing of pre- and post-synaptic spikes determines synaptic strength changes.
  • Structural Changes: Learning isn't just about changing connection strengths. New synapses form (e.g., dendritic spines seeking connections), and network topology itself changes.
  • Neurogenesis and Neural Death: New neurons are born, particularly in areas like the hippocampus and olfactory system, but programmed cell death (apoptosis) is also crucial, especially during development, pruning connections and potentially aiding forgetting.
  • Neuromodulation: Learning isn't monolithic. Global or semi-global signals like dopamine (often linked to reward) and norepinephrine (linked to surprise) modulate learning processes, signaling when and perhaps how strongly learning should occur.
  • Internal States: He also touched upon the importance of internal neuronal states (down to gene expression and chromatin changes) and overall brain states (mood, attention) influencing learning.
  • Cues: Learning is guided by external cues (success/failure, rewards) and internal drives (like curiosity or epistemic drive, famously explored in AI reinforcement learning).

Sketch by Stan Kerstjens

Vijay noted that while neuroscience often focuses intensely on local plasticity rules due to their tractability, the brain employs this much richer, multi-layered toolkit.

The Two-Way Street: Brains <-> AI

He highlighted the productive, bi-directional exchange between neuroscience and AI. The brain inspired early AI (like the perceptron), and AI concepts provide powerful models for understanding brain function. He cited the Hopfield network, inspired partly by physics models, whose attractor dynamics were later found to be implemented almost text-book style in systems like the fly's head-direction circuit (referencing work by Vivek Jayaraman).

However, Vijay also critiqued current AI trends, particularly the massive scale, energy consumption, and data requirements of training large language models. He argued this path is unsustainable and perhaps inefficient, predicting a future dominated by smaller, autonomous, adaptive agents (more akin to biological organisms) that need to learn efficiently from limited experience and energy budgets. This motivates understanding more brain-like learning principles.

Different Flavors of Learning

Vijay outlined different kinds of learning tasks, acknowledging the boundaries are fuzzy:

1. Motor Control: Learning precise, repeatable sequences (e.g., birdsong, playing an instrument).

2. Memory: More flexible storage and retrieval of information, potentially modifiable (distinguished from the sequential rigidity of motor control).

3. Inference & Decision Making: Interpreting inputs to infer latent states and make choices (e.g., object classification). There was some discussion here about whether this counts as "learning" if much of it is innate, but the consensus leaned towards including it, especially in the NeuroAI context where AI aims to learn capabilities that might be innate in biology.

Example 1: Birdsong Learning (Zebra Finch)

To illustrate motor sequence learning, Vijay detailed the zebra finch song learning circuit (see image below for clarification):

  • HVC: Acts as the "conductor," firing specific sequences of neural patterns that time the song syllables.
  • RA: The "student" area receiving input from HVC, whose job is to learn the correct association between HVC patterns and motor output.
  • LMAN: The "tutor" area, also receiving HVC input, which sends instructive signals to RA. It also injects variability/noise, crucial for exploration during learning.
  • Area X: Evaluates song quality (comparing to a stored template, likely learned from the father) and sends reinforcement signals (dopamine) back to LMAN, strengthening correct tutor signals.


Illustration of the Zebra finch song learning circuit

The learning happens at the HVC -> RA synapses via heterosynaptic STDP, guided by LMAN's input. Vijay emphasized that the precise template storage mechanism remains unknown. He also mentioned an interesting finding: the optimal teaching strategy (e.g., sequential vs. parallel presentation of song elements) depends on the specific STDP learning rule present at the synapse.

Example 2: Hippocampal Memory and Place Cells

Shifting to memory, Vijay discussed the hippocampus. Famous for "place cells" discovered by John O'Keefe (Nobel Prize 2014), which fire when an animal is in a specific location, the hippocampus (especially area CA3 with its recurrent connections) has long been theorized as a substrate for associative memory, perhaps like a Hopfield network (though direct proof remains elusive).

He presented a more modern view, suggesting place cells are better understood as context cells, responding conjunctively to place, smells, sounds, social context, etc.

In line with this, a computational model (Wang et al., NeurIPS 2023) treats the hippocampus simply as a memory engine, modeled as an autoencoder of high-dimensional "experience vectors." These vectors represent the multimodal input arriving at the hippocampus as the animal explores, assumed to be complex but relatively smooth functions of the animal's state (position, orientation, internal state, etc.). The autoencoder (trained via backprop in the model) learns to reconstruct the full experience vector from partial or noisy inputs encountered during simulated exploration bouts.

This simple model successfully reproduces a wide range of observed hippocampal phenomena without explicitly encoding space:

  • Place fields emerge naturally in units of the network.
  • Units develop multiple fields in larger environments, matching recent experimental findings.
  • The network exhibits remapping between different environments (contexts), generating orthogonal representations.
  • It shows robustness across different environment types and even abstract spaces.
  • It displays slow representational drift over long timescales, another recently observed phenomenon.

Vijay argued this demonstrates how a general "memory engine" principle, combined with the statistics of experience, can give rise to seemingly space-specific coding, offering testable mechanistic hypotheses for neuroscientists.

Inference, Learnability, and Bounded Agents

Though time was short for the third example (inference), Vijay made the following points:

  • Sophisticated inference is only useful under specific environmental conditions (not too random, not too stable, not too volatile). Humans adapt their inference strategies based on complexity and potential reward.
  • The tasks animals (and AIs) typically learn are likely a special "learnable" subset of all possible tasks, characterized by having broad support across generic feature spaces. Not everything is learnable given finite resources and fixed sensory inputs.
  • Real agents are bounded. Humans inherently incorporate inductive biases against model complexity, especially with limited data. Machines can be trained without this bias, but humans struggle to discard it.
He concluded by highlighting the importance of considering task structure, learnability, and agent bounds when studying or building learning systems, whether biological or artificial.


References

- Chen, X., Liang, C., Huang, D., Real, E., Wang, K., Pham, H., Dong, X., Luong, T., Hsieh, C. J., Lu, Y., & Le, Q. V. (2023). Symbolic Discovery of Optimization Algorithms. _37th Conference on Neural Information Processing Systems (NeurIPS 2023)_.

- Kelly, S., Park, D. S., Song, X., McIntire, M., Nashikkar, P., Guha, R., Banzhaf, W., Deb, K., Boddeti, V. N., Tan, J., & Real, E. (2023). Discovering Adaptable Symbolic Algorithms from Scratch. _arXiv preprint arXiv:2307.16890v2 [cs.RO]_. (Published at IROS 2023).

- Real, E., Liang, C., So, D. R., & Le, Q. V. (2020). AutoML-Zero: Evolving Machine Learning Algorithms From Scratch. _Proceedings of the 37th International Conference on Machine Learning (ICML)_, PMLR 119, 8007-8019.

- Wang, Z., Di Tullio, R. W., Rooke, S., & Balasubramanian, V. (2024). Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences. _arXiv preprint arXiv:2408.05798 [q-bio.NC]_.

Comments

Popular posts from this blog

Day 1: Welcome and history of neuromorphic engineering

Day 7: Future Neuromorphic Technologies with G. Indiveri, M. Payvand, R. Sepulchre and S.C. Liu

Day 2: Evolution of Neural Circuits, Eulogy for Yves Frégnac - Gilles Laurent and Florian Engert