Day 9 - Bio-Inspired RNNs (Dan Goodman, Panayiota Poirazi, Susan Stepney, Emre Neftci)

The CCW25 T-shirt logo, designed by Kieran Nazareth


In this session, we took a look at the underpinnings of the brain's efficiency, the computational power of physical objects, and a new perspective on the inner workings of transformers. Freeform Sketches by Stan Kerstjens


Dan Goodman: Brain Efficiency, Communication and Organization



Dan Goodman kicked things off with a tongue-in-cheek provocation: "feel free to ignore everything that neuroscientists tell you about how the brain works." His point wasn't to dismiss neuroscience, but to highlight that our understanding is far from complete. While detailed theories might be flawed, they can be hints towards new insights:

  • The Brain's Surprisingly Low Energy Bill:

    The brain seems to average about 0.1 spikes per neuron per second. This implies an incredibly sparse code, far sparser than many theories accommodate. If a theory requires neurons to fire at 100 spikes/second, then only a tiny fraction (0.1%) can be active at any given time. This goes against traditional rate-coding ideas and poses a puzzle: how is this energy budget so tightly regulated, seemingly independent of cognitive load (while awake)?

  • Encapsulated RNA: Neurons Sending 1KB Data Packets?

    A recent discovery involving the ARC gene could provide crucial for plasticity. This gene's products can self-assemble into virus-like capsids, each containing about 1 kilobyte of RNA information. These "packets" can then be released by one neuron and taken up by another, where the RNA can be translated in an activity-dependent manner. While the exact function is unknown, the fact that ARC is vital for plasticity means this intercellular RNA messaging is doing something important. Is it a slow, high-bandwidth information channel? A way to synchronize plasticity? This opened the discussion on broader exosome communication and even wilder theories about memory storage and epigenetic inheritance.

  • Split-Brain Patients:

    Dan's final example was the classic split-brain patient. After severing the corpus callosum (often for severe epilepsy), patients recover remarkably quickly and function with a high degree of normalcy, despite their two cortical hemispheres being largely disconnected. Experiments reveal the hemispheres can operate independently, leading to situations where one hand draws one thing while the other draws something else, or one side confabulates reasons for actions initiated by the other. This raises the question: if hemispheres can function so independently, could other brain regions also be more modular and self-sufficient than we typically assume?


Dan concluded by suggesting that perhaps we should think of the brain less like a rigid hierarchy and more like an "anarchist commune": a system of interacting, somewhat autonomous agents.

Jota Poirazi: Dendrites for Efficiency

Jota Poirazi shifted focus to the neuron's input structures, arguing that "dendrites make brains efficient."

  • Packing More In Less Space:

    Dendrites, being thin and extensive, allow the brain to achieve a massive surface area for connectivity (estimated 66 times more than if relying on cell bodies alone for the same area) without an unmanageable skull size. This huge dendritic arbor allows a single neuron to receive tens of thousands of inputs.

  • Dendrites as Active Computational Units:

    Far from being passive integrators, dendrites are packed with ion channels that allow them to generate their own spikes. Jota highlighted various types:

    • Sodium spikes: Fast, all-or-none events, good for precise coincidence detection (within milliseconds).
    • NMDA spikes: Slower, longer-lasting events (tens to hundreds of milliseconds), requiring prior depolarization to remove a magnesium block. These broaden the integration window.
    • Other calcium-based events with even longer timescales. This rich repertoire of non-linearities means individual dendrites can perform complex computations. Jota described how a neuron can be viewed as a multi-layer network, with dendrites acting as initial processing units whose outputs are then integrated at the soma. Early models suggested a two-layer network, but more recent work accounting for the diverse temporal dynamics might imply something closer to a seven-layer network within a single neuron!



  • Efficient Learning and Feature Detection:

    These dendritic non-linearities are also key for plasticity. NMDA spikes, with their associated calcium influx, are heavily implicated. An efficient way to trigger these events is through clustered synaptic input -> synapses that are spatially close and temporally correlated. Learning rules like BCM can explain how such clusters form. Once formed, these synaptic clusters on a dendrite effectively become feature detectors. A single neuron, with its many dendrites, thereby hosts a multitude of independent feature detectors, greatly increasing its computational capacity.

  • Dendritic Inspiration for ANNs:

    Jota touched on work translating these ideas to artificial neural networks. By structuring ANNs with dendritic-like compartments and local receptive fields, they observed benefits like needing fewer parameters, improved robustness to overfitting and noise, and better performance on complex sequential tasks. He also mentioned tools developed in his lab, DendroTweaks and Dendrify, for exploring these dendritic dynamics.

The discussion touched on the perennial STDP debate (Jota leans towards local depolarization being more fundamental than precise spike timing for some plasticity) and the general ubiquity of these dendritic processing capabilities.

Susan Stepney: You Don't Need a Brain (for Reservoir Computing)

Susan Stepney started out with the declaration: "You don't need a brain!" at least, not for certain kinds of computation. Her focus was on physical reservoir computing.




  • The Reservoir Computing Model (Echo State Networks):

    The basic idea involves a fixed, random recurrent neural network (the "reservoir"). Input signals perturb the reservoir's complex dynamics, and a simple, trainable linear readout layer learns to interpret these dynamics to produce the desired output. Crucially, only the output layer is trained; the reservoir's internal connections remain fixed.

  • From Neural Networks to Physical Materials:Ï

    The specific neural structure of the model isn't sacred. Any physical system with sufficiently rich, non-linear dynamics can act as a reservoir. The key is that if you "kick it, it wobbles in an interesting way." Instead of neurons and weights, we think about degrees of freedom and couplings within the material. Examples include: magnetic nano-rings, MEMS devices (like those in accelerometers), and even, as Helmut previously discussed, a silicone octopus arm. The critical aspects are how to get input into the material and how to read out its state.

  • Smart Sensors and Embodied Computation:

    An example was a MEMS-based accelerometer used as a reservoir to analyze human gait. The physical act of walking (the acceleration) directly provided the input to the MEMS reservoir, which then computed gait characteristics. The sensor is the computer, leading to very efficient, low-power "sensor-computing B1" systems.

  • Feedback for Control and Task Switching:

    Susan highlighted the power of adding feedback: the trained output of the reservoir can be fed back as an additional input. This allows the system to learn control tasks, like Nakajima's lab demonstrated with the synthetic octopus arm learning to control its own movements based on proprioceptive sensor readings from its body.

    Furthermore, by training different output "spectacles" (readout layers) for different tasks, the same physical reservoir, with the same environmental input, could perform various actions. A higher-level, perhaps "emotional," reservoir could then switch between these task-specific readouts, allowing for flexible behavior selection.




Emre Neftci: Transformers as Online Learners

Finally, Emre Neftci offered a new perspective on a ubiquitous AI model: Transformers. The take-home message: "Transformers are another way of learning how to do online learning."

  • Deconstructing Attention:

    The core of a Transformer lies in its attention mechanism (or "token mixer"). Emre used an analogy of writing a book: inputs (X) are like topics, which get projected into Keys (K - the book's index), Queries (Q - what you want to find), and Values (V - the actual content of the pages). The standard attention formula, Y=softmax(QKT)V, essentially looks up relevant "pages" (Values) based on the "query" and "index."

  • Linear Transformers and Hebbian Learning:

    If you remove the softmax, the attention becomes linear: Y=Q(KTV). This allows a reformulation: Y=QAt​, where At​=KTV. This "attention matrix" At​ can be updated recurrently with each new token: At​=At−1​+kt​vtT​. This is a Hebbian learning rule! So, in linear transformers, the attention mechanism behaves like a linear layer whose weights (At​) are dynamically changing during inference based on a Hebbian update.

  • Beyond Hebb: State-Space Models and Gradient-Based Updates:

    This update rule for At​ can be generalized, leading to connections with state-space models (like Mamba). Instead of a simple Hebbian update, At​ could be updated via gradient descent on a local loss function, for instance, minimizing the error in reconstructing V given K and A (min∣∣KA−V∣∣2). The transformer's outer-loop training then becomes a form of meta-learning: it learns to generate the K, Q, V projections such that this inner-loop, online learning process (the updating of At​) is effective.

  • Tackling Catastrophic Forgetting:

    Emre discussed how this framework can address continual learning challenges. By adding a metaplasticity-inspired term to the local loss that penalizes large changes to "important" past weights in At​, the system can learn new information without catastrophically forgetting old information. John (in a later session) would elaborate on a Bayesian approach to define these importance weights, which can prevent both forgetting and "catastrophic remembering" (where the system becomes too rigid).

Emre's presentation sparked discussion on the relationship between these transformer dynamics and those of more traditional RNNs like GRUs. This online learning perspective provides a powerful lens through which to understand and potentially enhance these powerful sequence models, even for neuromorphic hardware.

Projects continued throughout the afternoon, as the deadline on Saturday is getting closer and closer. Only the mandatory afternoon break could help clear some steam, the volleyball field as popular as ever.

 Shreya, Matthias and the Volleyballers



   Late-night brainstorming



Comments

Popular posts from this blog

Day 1: Welcome and history of neuromorphic engineering

Day 7: Future Neuromorphic Technologies with G. Indiveri, M. Payvand, R. Sepulchre and S.C. Liu

Day 2: Evolution of Neural Circuits, Eulogy for Yves Frégnac - Gilles Laurent and Florian Engert