Intermediate-Level Visual Processing and Visual Primitives
Internal Models of Object Geometry Help the Brain Analyze Shapes
Depth Perception Helps Segregate Objects from Background
Local Movement Cues Define Object Trajectory and Shape
Context Determines the Perception of Visual Stimuli
Cortical Connections, Functional Architecture, and Perception Are Intimately Related
Perceptual Learning Requires Plasticity in Cortical Connections
Visual Search Relies on the Cortical Representation of Visual Attributes and Shapes
WE HAVE SEEN IN THE PRECEDING chapter that the eye is not a mere camera but instead contains sophisticated retinal circuitry that decomposes the retinal image into signals representing contrast and movement. These data are conveyed through the optic nerve to the primary visual cortex, which uses this information to analyze the shape of objects. It first identifies the boundaries of objects, represented by numerous short line segments, each with a specific orientation. The cortex then integrates this information into a representation of specific objects, a process referred to as contour integration.
These two steps, local analysis of orientation and contour integration, exemplify two distinct stages of visual processing. Computation of local orientation is an example of low-level visual processing, which is concerned with identifying local elements of the light structure of the visual field. Contour integration is an example of intermediate-level visual processing, the first step in generating a representation of the unified visual field. At the earliest stages of analysis in the cerebral cortex these two levels of processing are accomplished together.
A visual scene comprises many thousands of line segments and surfaces. Intermediate-level visual processing is concerned with determining which boundaries and surfaces belong to specific objects and which are part of the background (see Figure 25-4). It is also involved in distinguishing the lightness and color of a surface from the intensity and wavelength of light reflected from that surface. The physical characteristics of reflected light result as much from the intensity and color balance of the light that illuminates a surface as from the color of that surface. Determining the actual surface color of a single object requires comparison of the wavelengths of light reflected from multiple surfaces in a scene.
Intermediate-level visual processing thus involves assembling local elements of an image into a unified percept of objects and background. Although determining which elements belong together in a single object is a highly complex problem with the potential for an astronomical number of solutions, the brain has built-in logic that allows it to make assumptions about the likely spatial relationships between elements. In certain cases these inherent rules can lead to the illusion of contours and surfaces that do not actually exist in the visual field (Figure 27–1).
Figure 27-1 Illusory contours and perceptual fill-in. The visual system uses information about local orientation and contrast to construct the contours and surfaces of objects. This constructive process can lead to the perception of contours and surfaces that do not appear in the visual field, including those seen in illusory figures. In the Kanizsa triangle illusion (top left) one perceives continuous boundaries extending between the apices of a white triangle, even though the only real contour elements are those formed by the Pac-Man–like figures and the acute angles. The inside and outside of the illusory pink square (top right) are the same white color as the page, but a continuous transparent pink surface within the square is perceived. As seen in the lower figures, contour integration and surface segmentation can also occur through occluding surfaces. The irregular shapes on the left appear to be unrelated, but when a partially occluding black area is overlaid on them (right) they are easily seen as fragments of the letter B.
First, context plays an important role in overcoming ambiguity in the signals from the retina. The way in which a visual feature is perceived depends on everything that surrounds that feature. The perception of a point or a line depends on how that object is perceptually linked to other visual features. Thus the response of a neuron in the visual cortex is context-dependent: It depends as much on the presence of contours and surfaces outside the cell’s receptive field as on the attributes within it. Second, the functional properties of neurons in the visual cortex are highly dynamic and can be altered by visual experience or perceptual learning. Finally, visual processing in the cortex is subject to the influence of cognitive functions, specifically attention, expectation, and “perceptual task,” ie, the active engagement in visual discrimination or detection. The interaction between these three factors—visual context, experience-dependent changes in cortical circuitry, and expectation—is vital in the visual system’s analysis of complex scenes.
In this chapter we examine how the brain’s analysis of the local features in a visual scene, or visual primitives, proceeds in parallel with the analysis of more global features. Visual primitives include contrast, line orientation, brightness, color, movement, and depth.
Each type of visual primitive is subject to the integrative action of intermediate-level processing. Lines with particular orientations are integrated into object contours, local contrast information into surface lightness, wavelength selectivity into color constancy and surface segmentation, and directional selectivity into object motion. The analysis of visual primitives begins in the retina with the detection of brightness and color and continues in the primary visual cortex with the analysis of orientation, direction of movement, and stereoscopic depth. Properties related to intermediate-level visual processing are analyzed together with visual primitives in the visual cortex starting in the primary visual cortex (V1), which plays a role in contour integration and surface segmentation. Other areas of the visual cortex specialize in different aspects of this task: V2 analyzes properties related to object surfaces, V4 integrates information about color and object shape, and V5—the middle temporal area or MT—integrates motion signals across space (Figure 27–2).
Figure 27-2 Cortical areas involved with intermediate-level visual processing. Many cortical areas in the macaque monkey, including V1, V2, V3, V4, and middle temporal area (MT), are involved with integrating local cues to construct contours and surfaces and segregating foreground from background. The shaded areas extend into the frontal and temporal lobes because cognitive output from these areas, including attention, expectation, and perceptual task, contribute to the process of scene segmentation. (AIP, anterior intraparietal cortex; FEF, frontal eye fields; IT, inferior temporal cortex; LIP, lateral intraparietal cortex; MIP, medial intraparietal cortex; MST, medial superior temporal cortex; MT, middle temporal cortex; PF, pre-frontal cortex; PMd, dorsal premotor cortex; PMv, ventral premotor cortex; TEO, occipitotemporal cortex; VIP, ventral intraparietal cortex; V1, V2, V3, V4, primary, secondary, third, and fourth visual areas.)
Internal Models of Object Geometry Help the Brain Analyze Shapes
A first step in determining an object’s contour is identification of the orientation of local parts of the contour. This step commences in V1, which plays a critical role in both local and global analysis of form.
Neurons in the visual cortex respond selectively to specific local features of the visual field, including orientation, binocular disparity or depth, and direction of movement, as well as to properties already analyzed in the retina and lateral geniculate nucleus, such as contrast and color. Orientation selectivity, the first emergent property identified in the receptive fields of cortical neurons, was discovered by David Hubel and Torsten Wiesel in 1959.
Neurons in the lateral geniculate nucleus have circular receptive fields with a center-surround organization (see Chapter 25). They respond to the light-dark contrasts of edges or lines in the visual field but are not selective for the orientations of those edges. In the visual cortex, however, neurons respond selectively to lines of particular orientations. Each neuron responds to a narrow range of orientations, approximately 40°, and different neurons respond optimally to distinct orientations. There is now good evidence for the idea, first proposed by Hubel and Wiesel, that this orientation selectivity reflects the arrangement of the inputs from cells in the lateral geniculate nucleus. Each V1 neuron receives input from several neighboring geniculate neurons whose center-surround receptive fields are aligned so as to represent a particular axis of orientation (Figure 27–3).
Figure 27-3 Orientation selectivity and mechanisms.
A. A neuron in the primary visual cortex responds selectively to line segments that fit the orientation of its receptive field. This selectivity is the first step in the brain’s analysis of an object’s form. (Reproduced, with permission, from Hubel and Wiesel 1968.)
B. The orientation of the receptive field is thought to result from the alignment of the circular center-surround receptive fields of several presynaptic cells in the lateral geniculate nucleus. In the monkey, neurons in layer IVCβ of V1 have unoriented receptive fields. However, the projections of neighboring IVCβ cells onto a neuron in layer IIIB create a receptive field with a specific orientation.
Two principal types of orientation-selective neurons have been identified. Simple cells have receptive fields divided into ON and OFF subregions (Figure 27–4). When a visual stimulus such as a bar of light enters the receptive field’s ON subregion, the neuron fires; the cell also responds when the bar leaves the OFF subregion. Simple cells have a characteristic response to a moving bar; they discharge briskly when a bar of light leaves an OFF region and enters an ON region. The responses of these cells are therefore highly selective for the position of a line or edge in space.
Figure 27-4 Simple and complex cells in the visual cortex. The receptive fields of simple cells are divided into subfields with opposite response properties. In an ON subfield, designated by “+,” the onset of a light triggers a response in the neuron; in an OFF subfield, indicated by “-,” the extinction of a bar of light triggers a response. Complex cells have overlapping ON and OFF regions and respond continuously as a line or edge traverses the receptive field along an axis perpendicular to the receptive-field orientation.
Complex cells, in contrast, are less selective for the position of object boundaries. They lack discrete ON and OFF subregions and respond similarly to light and dark at all locations across their receptive fields. They fire continuously as a line or edge stimulus traverses their receptive fields.
Moving stimuli are often used to study the receptive fields of visual cortex neurons, not only to simulate the conditions under which an object moving in space is detected but also to simulate the conditions under which stationary objects are tracked by the eyes, which constantly scan the visual environment and therefore move the boundaries of stationary objects across the retina. In fact, visual perception requires eye movement. Visual cortex neurons do not respond to an image that is stabilized on the retina because they require moving or flashing stimuli to be activated: They fire in response to transient stimulation.
Some visual cortex neurons have receptive fields in which an excitatory center is flanked by inhibitory regions. Inhibitory regions along the axis of orientation, a property known as end-inhibition, restrict a neuron’s responses to lines of a certain length (Figure 27–5). End-inhibited neurons respond well to a line that does not extend into the inhibitory flanks but lies entirely within the excitatory part of the receptive field. Because the inhibitory regions share the orientation preference of the central excitatory region, end-inhibited cells are selective for line curvature and also respond well to corners.
Figure 27-5 End-inhibited receptive fields. Some receptive fields have a central excitatory region flanked by inhibitory regions that have the same orientation selectivity. Thus a short line segment or a long curved line will activate the neuron (A and C) but a long straight line will not (B). A neuron with a receptive field that displays only one inhibitory region in addition to the excitatory region can signal the presence of corners (D).
To define the shape of the object as a whole, the visual system must integrate the information on local orientation and curvature into object contours. The way in which the visual system integrates contours reflects the geometrical relationships present in the natural world (Figure 27–6). As originally pointed out by Gestalt psychologists early in the 20th century, contours that are immediately recognizable tend to follow the rule of good continuation: Curved lines maintain a constant radius of curvature and straight lines stay straight. In a complex visual scene such smooth contours tend to “pop out,” whereas more jagged contours are difficult to detect.
Figure 27-6 Contour integration. (Adapted, with permission, from Li W and Gilbert CD 2002.)
A. Contour integration reflects the perceptual rules of proximity and good continuation. Each of the four images here has a straight line in the center, and all four lines have the same oblique orientation. In some images the line pops out more or less immediately, without searching. Factors that contribute to contour saliency include the number of contour elements (compare the first and second frames), the spacing of the elements (third frame), and the smoothness of the contour (bottom frame). When the spacing between contour elements is too large or the orientation difference between them too great, one must search the image to find the contour.
B. These perceptual properties are reflected in the horizontal connections that connect columns of neurons in the primary visual cortex with similar orientation selectivity. As long as the contour elements are spaced sufficiently close together, excitation can propagate from cell to cell, thus facilitating the responses of V1 neurons. Each neuron in the network then augments the responses of neurons on either side and the facilitated responses propagate across the network.
The responses of a visual cortex neuron can be modulated by stimuli that themselves do not activate the cell and therefore lie outside the receptive field’s core. This contextual modulation endows a neuron with selectivity for more complex stimuli than would be predicted by placing the components of a stimulus at different positions in and around the receptive field. The same factors that facilitate the detection of an object in a complex scene (Figure 27-6A) also apply to contextual modulation. The properties of perceptible contours are reflected in the responses of neurons in the primary visual cortex, which are sensitive to the global characteristics of contours, even those that extend well outside their receptive fields.
Contextual influences over large regions of visual space are likely to be mediated by connections between multiple columns of neurons in the visual cortex that have similar orientation selectivity (Figure 27-6B). These connections are formed by pyramidal-cell axons that run parallel to the cortical surface (see Figure 25-16). The extent and orientation dependency of these horizontal connections provide the interactions that could mediate contour saliency (see Figure 25-14).
Depth Perception Helps Segregate Objects from Background
Depth is another key feature in determining the shape of an object. An important cue for the perception of depth is the difference between the two eyes’ views of the world, which must be computed and reconciled by the brain. The integration of binocular input begins in the primary visual cortex, the first level at which individual neurons receive signals from both eyes. The balance of input from the two eyes, a property known as ocular dominance, varies among cells in V1.
These neurons are also selective for depth, which is computed from the relative retinal positions of objects placed at different distances from the observer. An object that lies in the plane of fixation produces images at corresponding positions on the two retinas (Figure 27–7). The images of objects that lie in front of or behind the plane of fixation fall on slightly different locations in the two eyes. Individual visual cortex neurons are selective for a narrow range of such disparities. Some are selective for objects lying on the plane of fixation (tuned excitatory or inhibitory cells), whereas others respond only when objects lie in front of the plane of fixation (near cells) or behind that plane (far cells).