# Stochastic Automata & Convolutional Networks

Automata (e.g. Conway’s Game of Life) are directly related to Convolutional Neural Networks (CNNs). Both inform and improve upon the other. An elaboration of cellular automata, using ** stochastic** rules, exhibits more complex behavior than automata with traditional, deterministic rules. Convolutional neural networks can be improved, in the same fashion.

**The Game of Life**

Cellular Automata operate like a game. In Conway’s Game of Life, a game board is populated with an arrangement of pieces. Those pieces, and the empty squares around them, are evaluated once per turn. The pieces can either remain (“live”), or disappear (“die”), and the empty squares might have a piece placed upon them (“born”), according to a set of rules.

For the traditional Game of Life, those rules are: a piece “lives” if it has **two or three neighboring pieces**, and it “dies”, otherwise; an empty square with **exactly three neighboring pieces** has a new piece “born” there.

By playing through many turns of the Game of Life, patterns are revealed: some arrangements of pieces are ‘stable’, unchanged by the rules; others are ‘mobile’, being transposed by the rules in a repeated sequence; many arrangements of pieces create a swirl of activity around themselves, like billowing smoke, and their billows can bump into near-by arrangements. This can trigger a cascade of changes, and it is uncertain how many turns are needed before all the billows are snuffed out. Even simple initial arrangements, like the r-pentomino, can take hundreds of turns to stabilize.

**Automata Behavior**

Many varieties of cellular automata have been studied. One is game ‘board’ that is only **a single row of squares**, and each successive ‘turn’ is depicted *beneath* the prior turn, to form an image. Stephen Wolfram cribbed and collated work on this kind of automata in ‘A New Kind of Science’, and presented his exhaustive study of *all the rules possible* for these boards. Examining every variety of rules on these single-row boards, he found four categories of behavior: 1) rules that ‘killed every piece’, 2) rules that ‘did nothing’, 3) rules that created *recursive* patterns, and 4) rules that created a kind of *randomness*.

For Conway’s Game of Life board, there are far too many rules to be examined. A piece may ‘live’ or be ‘born’ if it has 0, 1, 2, …, 8 neighbors,** in any combination**. The number of possible rules is therefore 2 to the 18th power. (That’s **262,144 **possible rules!) To evaluate the behavior of each rule, *many initial arrangements of pieces *must be examined, as well. Only a tiny fraction of these rules have been studied thoroughly.

Software exists, to visualize different rules as they unfold, and you can ‘paint’ the initial arrangement of pieces— Mirek’s Cellebration is a popular one. You can explore rules that *you specify*. It’s definitely more fun than minesweeper. :]

The rules available in the Game of Life show the same varieties of behavior as those studied by Wolfram: 1) rules that ‘kill all the pieces’, 2) rules that ‘do nothing’ to most of the board, 3) rules that billow with recursive structures, and 4) rules that bounce around through noisy ‘randomness’. Yet, there are other patterns of behavior, too…

In my own play, I found many rules that cause the pieces to ‘congeal’ into a crystal, bounding a region of the board, and these crystals tiled their interior with a simple pattern. In most cases, that interior tiling quickly became ‘stable’ — a static pattern remains, turn after turn. Some crystals, however, would maintain a wobbling ‘fringe’ along their outer edge that refused to settle down. Others developed an interior ‘noisy froth’, or a tiling pattern that ‘seethed’ wherever the tiles were **mismatched**. In general, they exhibited behaviors that were *specific to a local domain*: either at the **boundary** of the crystal, or throughout its** interior**. These behaviors constitute the emergence of a *new rule*, operating on a larger **scale** than the original rule for neighboring pieces.

**Emergent Behaviors in Different Domains**

A small number of these ‘crystallizing’ rules stood out: they supported **multiple** stable interior tilings. You could compare these tilings to the different crystal arrangements seen in minerals— some areas of a crystal may be stacked as a* hexagonal close* packing, while others are a *face-centered cubic* packing. Wherever these different packings meet, there is a *grain boundary*.

In a rare few of these cellular automata, wherever the different interior tilings met, their grain boundaries would ‘froth’, changing each turn. The pattern of these grain-boundary interactions became *it’s own stable interior*. In essence, the crystals grew to form **multiple** interiors, and the *boundaries* between them were forming *their own kind of crystal*! These were not just ‘recursive’ or ‘noisy’ behaviors; they were **higher forms of emergence**. Each tiling pattern behaved a certain way, and each kind of grain boundary elicited its own new behavior.

Stranger still, as these boundary-crystals were buffeted by the ‘froth’ at the grain boundary, they sometimes ‘exploded’ into new patterns. For one rule in particular, the new pattern would *expand*, and become dominant. The entire crystal underwent a **phase change, **like a spontaneously freezing beer. Another rule produced explosive patterns that would ‘churn’, surviving for hundreds of turns by *migrating* up and down the grain boundary.

Clearly, the Game of Life displays a greater variety of behaviors than Wolfram’s single-row automata. This fact is critical to understanding the sorts of **recognition** possible in Convolutional Neural Networks, because Conway’s Game of Life is *mathematically equivalent to convolution*.

**Convolutional Neural Networks**

A convolutional network receives a field of input pixels, *synonymous with Conway’s pieces on a board*. (Each layer of the convolutional network is composed of ‘neurons’ instead of ‘squares’.) And, each neuron receives a signal from a neighborhood of neurons beneath it, *exactly like Conway’s automata looking at its neighboring squares*. The signals flowing through a convolutional network are equivalent to the migration of Conway’s pieces, with **each new convolutional layer** corresponding to** another turn in the Game of Life**.

When back-propagation finds a good convolutional network, it is actually finding *good rules* for a Game of Life automata. However, the rules available to a convolutional neural network are a bit different from the traditional Game of Life. First, convolutional networks **sum** their neighbors’ activations, instead of **counting** them like Conway does. Second, convolutions’ activations are **monotonic** (‘always increasing’), while Conway’s rules can be **non-monotonic** (a piece might “live” if it has 2 or 5 neighbors, while it “dies” if it has 3 or 4 neighbors, for example). Third, convolutional networks have a *synaptic weight* for each neighbor, while Conway’s Life treats all neighbors equally.

This last difference allows convolutional networks to recognize *edges* in an image, because they are sensitive to the *relative arrangement* of pixels (some neighbors have a heavier weight than others). You could improve and generalize Conway’s Life, by specifying *the extent to *w*hich* neighbors must be present for a piece to “live” — doing so captures convolutions’ capacity for edge detection.

Generalizing Conway’s Life to include these *relative arrangements* of pieces vastly increases the *number of possible rules*. There are 2 to the 9th power possible arrangements, and each arrangement can have one of two rules — “live” or “die”. So, the number of possible rules is **2 to the 512th** power! To learn edge detection, a convolutional neural network traverses these possible rules, **seeking an automata that performs that task**.

**Emergent Behavior**

The rules possible for convolutional networks are actually *more diverse* than “2 to the 512th power”, because each neighbor’s contribution is *weighted along a continuum* (the neurons’ synaptic weights). So, CNNs have a **huge range of possible behavior**. The generalized Game of Life that I mentioned above corresponds to a network where weights** could only be 0 or 1**. In contrast, a CNN could have weights of -2.3 or 100.487 or 0.3… So, the Game of Life’s [0, 1] weights are just a **subset** of the [negative to positive] weights available in convolutional networks.

Similar to the ‘crystal tilings’ and ‘seething’ grain boundaries that I found in Conway’s Game of Life, I expect that **additional levels of emergence** could be found in convolutional networks, because they weight neighbors. Unfortunately, convolutional networks are *not currently looking for this emergence *— the Game of Life’s crystal tiling behavior appeared only over the course of *hundreds of turns*, which is equivalent to a pattern across *hundreds of layers of a CNN*. Current convolutional neural networks, being only a few layers deep, are only able to search for rules that **rapidly congeal to a stable arrangement **(Wolfram’s 2nd type of behavior); they ignore rules that ‘froth’ or ‘seethe’ at the boundary.

**Improving CNNs**

While it may seem that convolutional neural networks *completely include* all possible Game of Life rules (because synaptic weights can exist along a continuum, while the Game of Life has weights 0 or 1 only), there is a way that the Game of Life is vastly more diverse than convolutional networks: it is **not monotonic**.

Convolutional networks’ neurons respond **monotonically** (if the neuron’s *received signals* increase, then that neuron’s *activation* will increase also). The Game of Life’s cell’s activations are **non-monotonic**, because they can go **up or down.** (If 1 is “live” and 0 is “die”, a **monotonic rule** would be “live if surrounded by fewer than 4 neighbors, die otherwise”, while a **non-monotonic rule** would be “live if the number of neighbors is 2, 4, 5, or 7, die otherwise”.)

Looking at the complex behaviors that emerge from the Game of Life’s variations, I suggest that convolutional neural networks adopt **polynomial activation functions**. With a polynomial activation function, an increase to a neuron’s stimuli might **raise** the neuron’s activity, *up to a point*; then, it’s activity may begin to** decrease**; and *even more* stimulus might cause it to rise again! I expect that **a greater diversity of local phenomena could be captured by a convolutional network with polynomial activation**, even though it may fail to converge on a *global* minimum during training.

**Stochastic Automata**

The Game of Life, being non-monotonic, provides an improvement for convolutional networks. Yet, there is something that they *both* lack, which may increase the diversity of behavior in both: **stochastic** weights. If a Game of Life were stochastic, then there would be *some probability for each neighbor to be ignored*, or each neighbor would be* quieted by a variable amount*. This is similar to Drop-Out, a commonly used regularizer for training convolutional neural networks. However, Drop-Out is currently only used during *training*; **I suggest that this stochastic quieting be used during regular operation, as well**.

**Pulling all these strands together:**

- By using
**stochastic quieting on a convolutional network**, a neuron’s stimuli would*vary when presented with identical inputs*(an image, presented to the CNN twice, might generate**two different answers**). The neural network would ‘grow up’ listening to these ‘noisier’ inputs, and it would operate day-to-day with that same ‘noise’. - Neurons’
**activation functions would be polynomials**, to mimic the dynamism and patterns observed in the Game of Life. As a result, the layers of the network would not stabilize as easily;**they would**. Instead, these convolutional networks would continue to generate layer upon layer, to any depth necessary, like the turns of a Game of Life.*not*be like Wolfram’s 2nd type of automata **Compare the behavior across many layers**, to arrive at an ‘output’. This could be done by measuring the*variance*at each neuron, across a set of layers. (Example: the neuron at the position (2, 3) changes layer by layer, creating a repeated sequence of activations: (4, 1, 0, 1); its mean activation is 1.5;*variance is 6.75*.) By measuring each neuron’s**variance**as the network’s ‘output’, you would*distinguish neurons which had stabilized*from neurons that ‘explode’ with*periodic activity*.

Even better, the ‘output’ of a CNN could be the **coefficients of the first few terms of a Fourier Transform** for each neuron’s activations across layers. (In the example with a neuron that* repeats the sequence of activations*: (4, 1, 0, 1), the activations are* described by the polynomial*: 1(x-3)² + 0, and that neuron’s *output layer* would be the Fourier coefficients: (1, -3, 0).) Fourier terms would reveal the **periodicity** that I observed in those strange variants of the Game of Life. I expect that diverse features can be classified using these ‘crystals’ that ‘froth’ and ‘seethe’ at their boundaries.