Neural Networks

BY ITSELF A SINGLE NEURON is not intelligent. But a vast network of neurons can think, feel, remember, perceive, and generate the many remarkable phenomena that are collectively known as “the mind.” How does intelligence emerge from the interactions between neurons? This is the central question motivating the study of neural networks. In this appendix we provide a brief historical review of the field, introduce some key concepts, and discuss two influential models of neural networks, the perceptron and the cell assembly.

Starting from the 1940s researchers have proposed and studied many brain models in which sophisticated computations are performed by networks of simple neuron-like elements. Most models are based on two shared principles. First, our immediate experience is rooted in ongoing patterns of action potentials in brain cells. Second, our ability to learn from and remember past experiences is based at least partially on long-lasting modifications of synaptic connections. Although these principles are widely accepted by neuroscientists, they immediately suggest many difficult questions.

For example, to our conscious minds, perceiving an object or moving a limb is experienced as a single, unitary event. But in the brain either act is the result of a collection of a stupendous number of neural events—the discharge of action potentials or the release of neurotransmitter vesicles—indiscernible by the conscious mind. How are these events united into a coherent perception or movement?

Storage of our immediate experience in long-term memory is presumed to occur with changes in synaptic connections. But how exactly is a memory divided up and distributed across many synapses? If some synapses are used to store more than one memory, how then is interference between memories avoided? When past experiences are recalled from memory, how might synaptic connections evoke a pattern of firing that is similar to a pattern that occurred in the past? Finally, when we reason, daydream, or otherwise float in the stream of consciousness, our mental state is not directly tied to any immediate sensory stimulus or motor output. How do networks of neurons dynamically generate the patterns of activity related to such mental states?

These are profound questions. Many hypothetical answers have been proposed in the form of neural network models, a body of work that spans many decades and which we survey here. Although they are far from being tested conclusively, these hypotheses have influenced the research of a number of experimental neuroscientists and are being developed further today by theoretical neuroscientists.

Early Neural Network Modeling

Perhaps the first attempt to explain behavior in terms of synaptic connectivity was Sherrington’s reflex arc. A reflex behavior is defined as a rapid, involuntary, and stereotyped response to a specific stimulus (see Chapter 35). For any reflex behavior one can generally identify a reflex arc, a chain of synapses starting from a sensory neuron and ending with a motor neuron. The sequential activation of neurons in this chain is a series of causes and effects that connect the stimulus to the response. The reflex arc can be regarded as an ancestor of neural network models.

In 1938 Rafael Lorente de Nó, a student of Santiago Ramón y Cajal, argued that synaptic loops (“internuncial chains”) were the basic circuits of the central nervous system. A synaptic loop is a chain of synapses that starts and ends at the same neuron. It is a closed chain, in contrast to the open chain of a reflex arc. Lorente de Nó suggested that the purpose of these loops was to sustain “reverberating” activity patterns. In fact, Sherrington’s student, Graham Brown, in his studies of spinal cord rhythmicity, proposed a related view of the brain, involving intrinsic generation of neural activity rather than stimulus-response relationships. These scientists emphasized that the brain has an intrinsic dynamic richer than that of reflex arcs, which are inactive until stimulated by the outside world.

In an influential book published in 1949, Donald Hebb proposed the idea of a “cell assembly” as a functional unit of the nervous system and discussed the form of synaptic plasticity that would become known as Hebb’s rule. (The rule had previously been formulated by several other thinkers, of whom the earliest was perhaps the philosopher Alexander Bain in 1873.) Hebb argued that repeated synaptic communication between neurons could strengthen the connections between the neurons, creating synaptic loops that were capable of supporting the reverberating activity patterns of Lorente de Nó.

These ideas of Sherrington, Graham Brown, Lorente de Nó, and Hebb were later formalized in mathematical models of neural networks. Two famous classes of models are perceptrons and associative memory networks. Perceptrons have been popular as models of the visual system because they illustrate how recognition of an object can be decomposed into many feature detection events. A perceptron can be organized hierarchically, so that the decomposition process begins with simple features at the bottom of the hierarchy and proceeds to complex features at the top, as is thought to occur in the visual system (see Chapter 28).

Associative memory networks have been used to model how the brain stores and recalls long-term memories. Central to these models is Hebb’s concept of the cell assembly, a group of excitatory neurons mutually coupled by strong synapses. Memory storage occurs with the creation of a cell assembly by Hebbian synaptic plasticity (see Chapter 66), and memory recall occurs when the neurons in a cell assembly are activated by a stimulus.

The perceptron and the cell assembly have very different synaptic connectivities. As in Sherrington’s reflex arc, the polysynaptic pathways in a perceptron all travel in the same overall direction, from the input layer to the output layer. The perceptron generalizes the reflex arc, because it allows many synapses to diverge from a neuron and converge onto a neuron.

The perceptron is a special case of a feed-forward network, defined as one with no synaptic loops. As noted above, a synaptic loop is defined as a polysynaptic pathway that starts and ends at the same neuron. Networks with loops are called recurrent or feedback networks, to distinguish them from feed-forward networks. A cell assembly typically contains loops, and is therefore recurrent.

Lorente de Nó and Hebb postulated that neural activity can persist longer in the brain by circulating through synaptic loops. Thus a cell assembly can maintain a persistent activity pattern resembling patterns observed by neurophysiologists in studies of short-term and working memory. In other words, loops could be important for the generation of persistent mental states in the brain, which are required for behaviors in which stimulus and response are separated by a long time delay. In contrast, the direct pathways of the perceptron are suited for modeling behavioral responses that immediately follow a stimulus.

Only very simple neural networks are described in this appendix. The “neurons” in these models are much simpler than biological neurons, and the “synapses” do not do justice to the intricacies of biological synapses. When modeling a complex system, simplifying its elements helps one to focus on the properties that emerge from the interactions between them. This strategy has historically been used by neural networks researchers focusing on emergent properties of brain function. More realistic models of how neurons integrate synaptic inputs are described in Appendix F.

Neurons Are Computational Devices

Action potentials and synaptic potentials are dynamic events that involve a complex interplay between the membrane voltage of a neuron and the opening and closing of its ion channels. Computational neuroscientists often ignore these complexities in their thinking and instead rely on the following simplification: A neuron fires an action potential when a sufficiently large number of excitatory synapses onto it are activated simultaneously.

This statement is based on the fact that a single excitatory postsynaptic potential is typically much smaller in amplitude (less than 0.5 mV) than the gap of many millivolts that separate the resting potential from the threshold for an action potential. Therefore, many simultaneous excitatory postsynaptic potentials need to sum in the postsynaptic neuron to drive its voltage over the threshold for firing.

The above simplification of the conditions for neuronal firing has inspired a great deal of mathematical formalism. In 1943 Warren McCulloch and Walter Pitts proposed a model of the computation performed by a neuron and the excitatory synapses converging onto it. The McCulloch-Pitts neuron takes multiple inputs and produces a single output. All inputs and the output are binary variables, 0 or 1. The neuron is characterized by a single parameter θ, its threshold. If a subset of, θ or more inputs is equal to 1, then the neuron’s output is 1; otherwise the output is 0.

In the biological interpretation of the McCulloch-Pitts model each input variable represents the activation of an excitatory synapse at the neuron. The input is equal to 1 when the excitatory synapse is activated. The parameter θ is used to model the threshold of a biological neuron and is equal to the minimum number of excitatory synapses that must be simultaneously activated to produce an action potential. In this interpretation the McCulloch-Pitts model formalizes the above caricature of a biological neuron.

Two McCulloch-Pitts neurons can be connected so that the output of one neuron is the input of another. This corresponds to the biological fact that excitatory synapses converging onto a neuron are activated by the discharging of the presynaptic neurons. By making many such connections, it is possible to construct a model of a neural network.

In the McCulloch-Pitts model, neurons are either active (“1”) or inactive (“0”). This is admittedly a crude way of describing neural activity, because it does not distinguish between active neurons with different firing rates. But this coarse description is used not only by theorists but also by experimental neurophysiologists, who often speak of active and inactive neurons in the exploratory phases of their experiments before they make precise measurements of firing rates. Although the graded nature of firing rates can be captured using more realistic model neurons (Box E-1), here we will limit ourselves to the McCulloch-Pitts model to minimize the use of mathematical equations.

This simplification also allows the application of ideas from Boolean logic, in which the binary values 0 and 1 correspond to “false” and “true.” Boolean logic, named after the British mathematician George Boole, is a formalization of deductive reasoning that is based on manipulations of binary variables that represent truth values. Boolean logic is the mathematical foundation of digital electronic circuits. Using their model, McCulloch and Pitts argued that the activity of each neuron signifies the truth of some logical proposition. They concluded that neurons (and by extension networks of neurons) perform logical computations.

A Neuron Can Compute Conjunctions and Disjunctions

If we accept the idea that biological neurons can perform logical computations, then it is natural to ask what types of computations are possible. We will answer this question by studying the behavior of the McCulloch-Pitts model neuron. Of course, biological neurons are more complex and therefore likely to be more powerful computational devices. But by analyzing the McCulloch-Pitts neuron we can expect to establish lower bounds on the computational power of biological neurons. In other words, if a computation is possible for a McCulloch-Pitts neuron it should be possible for a biological neuron, although the converse is not necessarily true.

Box E-1 Mathematics of Neural Networks

The McCulloch-Pitts neuron is simple enough that its behavior can be described in words. More sophisticated models require the precision of mathematics for a clear formulation.

The linear-threshold (LT) model neuron corrects a shortcoming of the McCulloch-Pitts neuron that all excitatory inputs are equally effective in bringing the neuron to its firing threshold; the number of active inputs is important, but their identities are not. For a biological neuron in which some synapses are stronger than others, such a simplification is not realistic.

To model this aspect of synaptic function, the LT neuron takes the weighted sum of its inputs, where the weights of the sum represent synaptic strengths. If the sum exceeds a threshold, the LT neuron becomes active.

To model a network of LT neurons, assume that their activities at time t are given by the N variables, x₁(t), x₂(t) …, x_N(t) which take on the values 0 or 1, that is, a neuron is either active (“1”) or silent (“0”). Then the activities at time t + 1 are given by

where H is the Heaviside step function defined by H(u) = 1 for u 0 and H(u) = 0 otherwise, W_ij is the strength or weight of the synapse between neuron i and the presynaptic neuron j, and θ_j is the threshold of neuron i. For a network of N neurons, the synaptic weights W_ij form an N × N matrix, and the thresholds θ_j an N-dimensional vector.

The LT and McCulloch-Pitts models are equivalent if the synaptic strengths of the LT model satisfy two conditions. First, the strengths of all excitatory synapses must equal one to yield the uniformity of strengths discussed above. Second, each inhibitory synapse must be so strong that activating it is enough to keep the LT neuron below threshold, no matter how many excitatory inputs are active. This second condition is in accord with the behavior of inhibition in the original McCulloch-Pitts neuron and could be regarded as a crude model of shunting inhibition (see Chapter 10).

The LT neuron of Equation E-1 can perform many different types of computation, depending on the choice of synaptic weights and thresholds. By arguments similar to those given in the main text, any Boolean function can be realized by combining LT neurons into a network. A perceptron network can be implemented by a synaptic weight matrix in which certain elements are constrained to be zero. (Such elements would give rise to “backwards” pathways in the perceptron model illustrated in Figure E-1.) An associative memory network can be constructed by choosing W_ij to be a correlation matrix (see Box E-3).

The LT neuron is either active or inactive, but the firing rates of biological neurons are continuously graded quantities. This can be modeled by replacing the Heaviside step function H in Equation E-1 by some other function F with graded output. Neural activity is described by continuously graded variables r₁… r_N rather than binary variables, which are interpreted as rates of action-potential firing. Furthermore, time can be treated continuously in the differential equation rather than discretely as in Equation E-1. This type of model is discussed in more detail in Appendix F.

In Equation E-2 the soma of the neuron is regarded as a device that converts input current into the cell’s rate of firing. This point of view is often taken by electro-physiologists, who characterize a neuron by its f-I curve, plotted by injecting current into a neuron and recording the resulting firing rate. The dendrite of the neuron is assumed to linearly combine the currents produced by its synapses, a good approximation in some biological neurons. Each synapse generates a current that is proportional to the firing rate of its presynaptic neuron.

Equation E-2 is still quite crude in its description of neural activity as an overall firing rate. More sophisticated models have differential equations governing voltages and conductances and generate individual action potentials. For example, the voltages in the numerical simulations of Figure E-5 were generated by leaky integrate-and-fire model neurons. More about this and other spiking model neurons can be found in works listed in the bibliography at the end of the appendix, as well as in Appendix F.

Suppose that the threshold parameter θ of a McCulloch-Pitts neuron is set at a high value, equal to the total number of inputs. Then the neuron is active if, and only if, all of its synaptic inputs are active. In other words, the output of the neuron is the conjunction of its input variables, which is also known as the logical AND operation. Alternatively, the threshold can be set at a low value, equal to one, such that activation of one or more synaptic inputs is enough to activate the neuron. In this case the output of the neuron is the disjunction of its input variables, which is also known as the logical OR operation.

Although a McCulloch-Pitts neuron can compute some logical functions, it cannot compute others. A famous example is the exclusive-or (XOR) operation. By definition the XOR operation on two inputs results in “1” if, and only if, exactly one of its inputs is “1.” Thus if both inputs are “1,” the XOR function outputs “0,” while the OR function outputs “1.” Proving that a single McCulloch-Pitts neuron cannot compute the XOR operation is left as an exercise to the reader. However, XOR can be computed by a network of McCulloch-Pitts neurons, as is explained below.

A Network of Neurons Can Compute Any Boolean Logical Function

What functions can be computed by a network of McCulloch-Pitts neurons? Conjunctions and disjunctions are basic building blocks of Boolean logic. The original definition of a McCulloch-Pitts neuron included both inhibitory and excitatory synapses. It turns out that synaptic inhibition can be used for the operation of negation (logical NOT).

Consider a neuron that is spontaneously active and receives a single strong inhibitory synapse. When the inhibitory synapse is inactive, the neuron is spontaneously active. But when the inhibitory synapse is active, the neuron is inactive, silenced by inhibition. In other words, the neuron responds with 1 when its input is 0 but with 0 when its input is 1. This is exactly the NOT operation.

It is well known that any function of Boolean logic can be synthesized by combining the AND, OR, and NOT operations. Because McCulloch-Pitts neurons can compute all of these operations, it follows that networks of McCulloch-Pitts neurons can compute any function of Boolean logic, including XOR.

Why is it important that these models compute Boolean functions? Boolean logic lies at the heart of modern digital computers. The computers on our desktops, and in fact all digital electronic circuits, are designed to implement Boolean logic. When a digital computer runs a software program, it simply executes sequences of logical operations. Thus networks of McCulloch-Pitts neurons can compute the same functions as digital computers.¹

These facts about networks of McCulloch-Pitts neurons were discovered in the 1940s and 1950s when neural network models played a role in the formal theory of automata and computation. This line of research showed that neural network models have great computational power in principle. Nevertheless, a difficult question remains: How are computations actually performed by brains? This question cannot be answered by formal arguments alone. It is now being addressed both by theoretical and experimental neuroscientists who try to understand how the brain works, and by computer scientists and engineers who create artificial systems that emulate capabilities of the brain.

The notion that a neuron is a device for computing conjunctions and disjunctions is prominent in the ensuing discussion of neural network models of the visual system.

Perceptrons Model Sequential and Parallel Computation in the Visual System

The term perceptron was coined in the 1950s by Frank Rosenblatt to describe his neural network models of visual perception. In a perceptron neurons are organized in layers (Figure E-1).² The first layer is the input to the network and the last layer the output. Each layer sends synapses only to the next layer, so that information flows in the “forward” direction from the input to the output. Although perceptrons can be constructed from various kinds of model neurons, we will use the simple McCulloch-Pitts neurons.

Figure E-1 The perceptron model. A perceptron is a network of idealized neurons arranged in layers with synaptic connections from each layer to the succeeding one. In general, any number of “hidden layers” may intervene between the input and output. Each disk represents a neuron. An arrow pointing from the presynaptic neuron to the postsynaptic neuron represents a synapse. There are no loops in the network.

The computations in a perceptron, as in the visual system, occur through both sequential and parallel processing of information. The layers of a perceptron can be regarded as a sequence of steps in a computation. The neurons within each layer perform similar operations that are executed in parallel during a single step of the computation. Because vision is often quite fast compared to other cognitive tasks, it may require only a few sequential steps, but each step involves a large number of operations performed by many neurons working in parallel. It is natural to represent this kind of computation by a perceptron with a small number of layers, each with many neurons.

Simple and Complex Cells Could Compute Conjunctions and Disjunctions

We shall develop the analogy between perceptrons and the visual system by exploring its implications for primary visual cortex (V1). As discussed in Chapter 27, the “simple cells” of V1 respond selectively to stimuli in the visual field that have a certain spatial orientation. A simple cell responds to a bar of light close to a particular orientation but not to bars with other orientations.

In a classic 1962 paper David Hubel and Torsten Wiesel described this property of orientation selectivity in V1 and also proposed the first model of how it is achieved. They assumed that what they called a “simple” cortical cell receives synaptic inputs from cells in the lateral geniculate nucleus (LGN) and suggested that orientation selectivity of the simple cell in V1 depends on the spatial arrangement of the receptive fields of the LGN cells. Thus, if the center-surround receptive fields of the LGN cells were arranged along a straight line (see Figure 27-3), a bar of light with the same orientation as this line would activate all the LGN inputs of the simple cell simultaneously, driving the cortical simple cell that receives these inputs above the threshold for firing action potentials. Conversely, a bar of light at nonpreferred orientations would stimulate only some of the LGN inputs, leaving that simple cell below threshold for firing.

The preceding model of a simple cell can be interpreted as a McCulloch-Pitts neuron computing an AND operation (Figure E-2A) because a simple cell fires when all of its LGN inputs are activated. Recall that a McCulloch-Pitts neuron computes a conjunction if its threshold is set sufficiently high, and intuitively it makes sense that a high threshold goes along with high selectivity.

Figure E-2 A perceptron implementing conjunction (AND), disjunction (OR), and the Hubel-Wiesel neurobiological model of simple and complex cells in visual cortex. Neurons are represented by disks and synapses by arrows. Active neurons and synapses are colored red.

A. A neuron with a high threshold can compute the conjunction of three inputs. The neuron does not respond to only one input (top) or two inputs (not shown). It becomes active only when all three inputs are active (bottom).

B. A neuron with a low threshold can compute a disjunction of three inputs. The neuron remains inactive if all of its inputs are inactive (top). It becomes active if a single input neuron is active (bottom) or more than one input neuron is active (not shown).

C. In this realization of the Hubel-Wiesel model a disjunction neuron (right) receives inputs from a set of conjunction neurons (middle), which in turn receive inputs from a grid of neurons (left). The neurons in the grid represent lateral geniculate nucleus (LGN) cells, which are assumed to be either all ON-center or OFF-center cells and retinotopically organized so that the location of each cell in the grid corresponds to the location of its receptive field on the retina. A horizontally oriented visual stimulus activates three LGN cells in a row, which activate a “simple cell” (conjunction) that in turn activates a “complex cell” (disjunction). Like actual simple cells of primary visual cortex, each conjunction neuron responds selectively to stimuli with a particular orientation (horizontal in this case) and at a particular location. Likewise, like actual complex cells, the disjunction neuron responds selectively to stimuli with a particular orientation but is invariant to the exact location of the stimulus.

In addition to simple cells, V1 also contains “complex” cells, also first described by Hubel and Wiesel. Like simple cells, complex cells are orientation selective, but their responses are not sensitive to the location of the stimulus within the receptive field, whereas simple cells are quite sensitive to the precise alignment of the stimulus within the excitatory subregions of their receptive field.

Hubel and Wiesel proposed that a complex cell receives synaptic input from simple cells with similar orientation selectivity (Figure E-2C). The receptive fields of the simple cells add together to form the receptive field of the complex cell. If a visual stimulus with the preferred orientation activates any one of the simple cells, the complex cell is driven over the threshold for firing. This model is intended to explain why spatial location of the stimulus in the receptive field is not a factor in activating the complex cell.

This model of a complex cell can be interpreted as a McCulloch-Pitts neuron computing an OR operation (Figure E-2B) since a complex cell fires when any of its simple cell inputs is activated. A McCulloch-Pitts neuron computes a disjunction if its threshold is set sufficiently low, and it makes sense that a low threshold is appropriate for nonselective responses.

In effect, Hubel and Wiesel imagined simple and complex cells as McCulloch-Pitts neurons, although they did not use such language. For a McCulloch-Pitts neuron the threshold determines whether responses are selective or invariant. The simple cell’s high threshold is responsible for the cell’s orientation selectivity, while the complex cell’s low threshold accounts for the invariance of its response to the location of the stimulus within its receptive field.