© Springer International Publishing AG 2017
Pascual Ángel Gargiulo and Humberto Luis Mesones-Arroyo (eds.)Psychiatry and Neuroscience Update – Vol. II10.1007/978-3-319-53126-7_2020. Eye Movements: Parameters, Mechanisms, and Active Vision
(1)
Department of Psychology, Technische Universität Dresden, Helmholtzstr. 10, Dresden, 01062, Saxony, Germany
(2)
Department of Psychology, Engineering Psychology and Applied Cognitive Research, Technische Universität Dresden, Helmholtzstr. 10, Dresden, 01062, Saxony, Germany
Abstract
Human eye movements are essential for visual perception, as the physiological structure of the eyes limits high acuity and colorful vision to a small fraction of the retina. Measuring the dynamic interplay of fixations (i.e., the eyes are stable relative to an object of interest) and saccades (i.e., the eyes are directed to a new target) makes possible fundamental insights into the organization of vision. A complex interaction of several types of eye movements is required when performing different tasks, such as orienting in space, identifying objects, or interacting with persons. Here, we discuss the characteristics of fixations and saccades in the context of active vision, with particular focus on the relationship between the two parameters. Analyzing the duration of fixations and the amplitude of saccades during everyday activities can reveal insights into the processing of visual information, allowing an understanding of what details of the environment receive attention. In addition, by considering fixations and saccades in combination, it can be determined how such details were processed within the context of ongoing activities.
Keywords
Eye fixationsSaccadesAttentionCognitive mechanismsActive visionIntroduction
In this chapter, we point to the fundamental basics of humans’ probably most important sense — sight. For instance, when asking people which modality they would miss most if lost, the majority is likely to indicate vision [1, 2]. In addition, if people describe objects they primarily use adjectives that refer to visual (60%) or tactual (32%) modalities [3]. For visual processing of the environment, the important role of eye movements has been repeatedly emphasized in the literature (e.g., [4, 5]). The main reason for this importance is based on the fact that the allocation of visual attention mostly corresponds to the direction of the eyes (e.g., [6]). Processing visual information is governed by a multitude of neural structures, both cortical and subcortical. Therefore, measuring eye movements is not only a complex study object in itself, it also delivers diagnostic information on different levels. To guide the reader to the potentials of the methods described here for current psychiatry, we refer to examples from schizophrenia research, where applicable.
Natural sampling of information from the environment during visual perception occurs via “active vision” [5, 7]. Because of the uneven distribution of light-sensitive receptors across the retina, the highest visual acuity is limited to the small foveal area (about 2 degrees of arc). Outside the high-resolution foveal area — in parafoveal and peripheral regions — vision becomes blurred and the perception of color is reduced. Given the constraints on visual acuity, eye movements are mandatory to perceive the environment. Saccades — fast ballistic movements — rotate the foveal region of the eyes from one point to another. The relatively stable periods in-between are called fixations. The intake of visual information occurs within fixations but is largely suppressed during saccades [8]. In most everyday situations, such as reading text or inspecting an image, oculomotor activity can be described as interplay between fixations and saccades.
Following a rather crude classification, three main areas of eye movement research can be identified: (i) analysis of eye movements in order to understand facets of reading, (ii) efforts to investigate gaze behavior during free visual exploration of natural stimuli, and (iii) work that comprises the examination of visual search processes in relation to eye movement strategies (see [9]). This classification provides a general overview but is an over-simplification. For instance, these categories do not consider the long history of eye movement research in clinical settings (for a recent overview in this particular field, see [10]). A closer examination of eye movement research reveals that significant work has been done to combine insights from different research areas, contributing to a more general understanding of common processes during, for example, reading and scene viewing [11]. In recent years, much interdisciplinary work has connected eye movement analyses with research questions from other disciplines. For instance, in the development and design of attention-sensitive interfaces, eye movement registration and analysis have become integral parts (e.g. [12–14]).
Eye movements are often considered “the window to the soul” [15], which can provide access to ongoing information processing. Combining video-based eye tracking technology — the means for non-invasive, extremely accurate, and fast measurements of different activities — with other measurements provides even more advantages. For instance, parallel recording with brain imaging enriches the explanatory power of eye movement data [16, 17]. Furthermore, it allows for a better understanding of brain activity since details about the inspected visual information can be extracted and assigned to the respective brain signals [18]. Human eye movements are a common output of a variety of psychophysiological mechanisms located well beyond the low-level oculomotor nuclei of the brain stem. Some of these mechanisms are hierarchically (or heterarchically; see [19]) organized. This vertical dimension of cognitive organization is an interesting object of scientific investigation in itself, for instance, to examine the correspondence between parameters of eye movements and the relative dominance of one (or several) such mechanisms. The results are expected to provide a better understanding of eye movement behavior and their control mechanisms. Such understanding will enable us to use eye movements as a powerful diagnostic instrument for the real-time measurement of different forms of cognitive activities or their impairments. As a result, eye tracking will become of greater importance in the development of future applications, as well as part of future applications themselves.
Parameters of Eye Movements
Understanding vision and visual perception is of long-lasting interest and dates back to classical antiquity. For instance, Plato (427–327 B.C.) developed the extramission theory to explain the process of vision. He imagined that vision occurs when light comes out of the eye and hits objects outside. Objects then release “flame particles” representing different colors [20]. Since these early attempts, many important discoveries have been made about vision in general and the function and purpose of individual parameters (for a review, see [21]). Much of the knowledge about eye movements has been uncovered in laboratory experiments by investigating various eye movement parameters. Nowadays, the interest in eye movement research has shifted to understanding the interaction between parameters and the meaning as a whole in the process of active vision (e.g. [5]).
The neural systems controlling eye movements are interesting because they form a network in the whole brain. Analysis of eye movements therefore provides amenable access to mechanisms in the active brain. Two different perspectives are possible when considering eye movements. Eye movements can be understood as the result of a highly complex and very precise motor system, but also as part of a sensory system that is instead concerned about where the eyes are directed to in space, technically referred to as gaze. For the study of gaze behavior, a functional distinction should be made between the various types of eye movements. Some eye movements are dedicated to gaze-holding and others are responsible for gaze-shifting.
Gaze-holding eye movements produce a stable image on the retina, which is important for perceiving and processing of information, similar to taking a picture with a camera. However, unlike a camera, the eyes can be held stable even if the head or body is moving. As soon as any form of head or body movements occurs, our visual system compensates for these movements. Gaze-holding movements are driven by the balance organs in the inner ear (the vestibular system). Accordingly they are named the vestibulo-ocular reflexes (e.g., [22, 23]). In the case of retinal image motion, for instance while looking out of the window of a moving train, another gaze-holding mechanism becomes activated, referred to as the optokinetic response ([24], see also [25]). Fixating on a moving object in front of a stationary or dynamic background requires a different class of gaze-holding movements to keep the object stable at the foveal region. Therefore the eyes need to be in motion but must also be stable with respect to the object. Such movements are called dynamic fixation or smooth pursuit movements (see, e.g., [26, 27, 28]). Smooth pursuit movements have also been of interest to psychiatric research from the very beginning onwards, as can be seen from the seminal work of Diefendorf and Dodge [29]. Among other tests, “ocular pursuit-reactions” were identified as a promising candidate for the investigation of schizophrenia (or dementia praecox as was the scientific term at that time). A meta-analytic review [30] reported maintenance gain, total saccade rate, and leading saccades to be the most promising specific measures in smooth pursuit research in the context of schizophrenia.
Gaze-shifting eye movements are necessary to redirect the small high-resolution foveal region to the respective points of interest. These saccadic movements are executed whenever a new object needs to be fixated. On average, about three saccades a second are performed. Further coordination is required because we have two eyes which need to be adjusted so that the image of an object falls on exactly the same parts of the two retinae. For distant objects, the two eyes must move in a conjunct manner. If an object comes closer, the eyes must converge to line up in a disjunct manner. This conjunct and disjunct behavior is summarized as vergence movements.
Furthermore, during visual fixations, our eyes make tiny movements (microsaccades, tremor, and drift) of which we are not aware (see, e.g., [31]). The existence of these movements was already mentioned about 150 years ago [32] and they were assumed to support the perception of fine spatial details [33, 34]. Early investigations disproved this hypothesis by reporting effects of visual fading if fixational eye movements are eliminated (e.g., [35]). The term microsaccade was introduced by Zuber and Stark [36], designating fixational movements within a range of 2 to 12 min arc. Around this time, it was hypothesized that microsaccades simply serve to compensate for errors produced by the slow drifts [37], but there was strong experimental evidence against this argumentation. It was reported that microsaccades can be voluntarily suppressed without indications of visual fading (e.g., [38]). Moreover, it was found that microsaccades disappear during the performance of high-acuity tasks, such as threading a needle [39]. These controversial findings provided the start of a long-lasting debate about the purpose of microsaccades (for overviews, see [40, 41, 42]).
Recent experimental findings have shown that fixational eye movements are important for preventing visual fading [43]. It also has been reported that microsaccades enhance the discrimination of fine spatial details [44] and briefly-presented stimuli [45]. Moreover, it was proposed that microsaccadic activity provides an index for covert attention shifts [46, 47]. Although recent reports emphasize the importance of microsaccades for visual perception [41, 43], the contribution of microsaccades in the process of active vision still remains unclear [48], and their significance during natural viewing is still debated [31]. Recently, microsaccades came into the focus of psychiatric research, too [49]. The authors investigated differences in free viewing ocular behaviour between healthy subjects and schizophrenic patients. Results showed no differences in terms of microsaccades, but overall scanning behaviour heavily depended from image content. However, based on the idea of a common mechanism of saccade generation [50], it will be a future task to broaden our understanding on the relationship between saccades and microsaccades in the context of schizophrenia research.
Microsaccades, clearly a facet of gaze-holding movements, can be taken as an example to illustrate that the distinction is somewhat artificial, motivated by the researchers’ interest in classification of phenomena. Both saccades, falling into the gaze-shifting category, and microsaccades can be drawn onto a single continuum of a main sequence, a term which has been adopted from astronomy [51]. Here, the relationship between the peak velocity and the amplitude of the saccadic movement can be mapped. It has been shown that the linearity of the relationship continues from the smallest microsaccades to wide saccades measured in everyday activities such as free inspection of a natural scene [52], providing some evidence for the hypothesis that both kinds of saccades are generated by the very same instances. Therefore, whether gaze is being held or being shifted might be a matter of perspective, or of scale. The distinction, however, is useful to examine underlying processes, and especially to pinpoint the qualitative and functional differences. In the following sections, we will take a closer look at fixations, saccades, and finally at the dynamic interplay of both in active vision.
Characteristics of Fixations
When investigating fixations, different aspects can be considered. First, the duration can be measured, indicating how long the eyes are stabilized with regard to a particular region or object in the environment. Accordingly, the duration of fixations provides a temporal characteristic and can also indicate where our eyes are attracted. The spatial distribution of fixations contains information about which object is fixated on first and where the eye is going with the next saccade. Another important issue is related to the information that is processed within a single fixation. What is the amount of information that can be perceived and processed? One can think about this question in terms of a particular window that might have a certain shape and size. Given a window size of approximately 2 degrees of visual angle, how can two successive fixations that are spatially separated by only 1 degree be resolved? Should these fixations be considered as only one fixation within a region? What are the mechanisms that control for all these different features? Which brain structures are involved in the temporal and spatial control of fixations? Since some of the mechanisms are known already, different models for fixational control have been suggested.
The early beginnings of eye movement research revealed that fixations vary with regard to their durations [53–55]. Fixation durations can vary from less than 100 ms to several seconds, but the vast majority of fixations are terminated after about 200 and 250 ms [56]. The variation in the duration of fixations has been attributed to different reasons. Evidence has been found that task type and difficulty influences the fixation duration. For instance, in silent reading, the mean fixation durations are shorter (225–250 ms) than in oral reading (275–325 ms; [9]). This difference could be related to the motor component when reading aloud. However, the observation of shorter fixation durations in visual search (180–275 ms) compared to longer fixations in scene perception (260–330 ms) indicates that the nature of the task clearly influences the length of fixations [9]. Furthermore, fixation durations can even be different within the same task. It has been found that inspecting the same visual stimuli under different instructions leads to significant changes in fixation durations [57, 58]. This has been interpreted as evidence for a relationship between fixation duration and the level of information processing, according to Craik and Lockhart [59].
The approaches discussed so far assume a direct connection between the duration of a fixation and the ongoing information processing. These direct control theories are supported by results from the stimulus onset delay paradigm (e.g., [60, 61, 62]). In this paradigm, the stimulus is removed during a saccade and reappears within the next fixation with specified delays. An increase in the fixation duration by the amount of the onset delay provides evidence that fixations are under direct control (e.g., [60]). Changes in the quality of available information can also influence the duration of fixations [63–65]. For example, Mannan and colleagues [64] reported longer fixations for low-pass-filtered than for unfiltered scenes. A prolongation of fixations has also been found when the amount of either foveal or peripheral information was limited by a gaze-contingent mask [66].
In contrast to the direct control assumption, it has also been argued that fixations might be governed indirectly by other factors. These indirect control theories propose that (i) the stimulus processing within a fixation is too slow to have an immediate effect (delayed control), (ii) the global parameters, such as the task, stimulus properties, etc., determine the length of fixations (global parameter control; e.g., [67, 68]), and (iii) there is an internal timing mechanism keeping the eyes moving at a constant rate. Recently, a mixed control model for fixation durations has been suggested [69, 70]. Applying the scene-onset delay paradigm to scene perception resulted in a prolongation of a certain proportion of fixations (supporting direct control) while other fixations remained unaffected by the scene-onset manipulation (supporting indirect control). These findings have resulted in a recent computational model for the control of fixations that accounts for variations in fixation durations in scene viewing [71]. The timing signals (i.e., fixation durations) of the model are based on continuous-time random walks. Furthermore, the level of visual and cognitive processing can modulate the onset of a saccade and thereby determine the length of a fixation.
One critical limitation of the discussed theories is the missing link to brain structures and their ongoing activity. A lot of information about structures and their functional contribution to the control of fixations have been accumulated during recent decades. However, these findings are mostly excluded from theories developed to explain the control and duration of fixations.
The spatial distribution of fixations across an image or in relation to the environment represents further important information for understanding the nature of visual sampling and processing. The locations of fixations reveal the strong interrelation between fixations and saccades. The spatial distribution of fixations can be examined with regard to the regions and objects, i.e., considering the location of the eyes for a certain period of time. Similarly, it can be explored why a saccade was performed towards a particular location in a scene. Regardless of which approach is taken (direct or indirect control), the eyes remain within a particular region until the feature extraction and information processing is completed. One of the first contributions to this topic was the feature integration theory [72, 73]. The approach was introduced to explain serial and parallel mechanisms in visual search. The key concept is based on the extraction of primary features, such as color, orientation, and shape, which are represented in separate feature maps. These feature maps are integrated in a saliency map that is accessible and used to direct attention to the most conspicuous areas.
The concept of the saliency map has become an essential part of computational models of focal visual attention, and thereby for the explanation of eye movement behavior (e.g., [74, 75, 76]). These attempts provide promising results and a first approximation for modelling the spatial distribution of fixations during the inspection of naturalistic stimuli. However, the essential limitation of the saliency approach is due to its exclusive focus on primary physical features of a scene. If the spatial distribution of fixations could be sufficiently explained by the analysis of such simple features, it could be concluded that visual attention is exclusively controlled in a bottom-up manner. Recent evidence revealed that this is not the case; rather, the deployment of visual attention is based on bottom-up as well as top-down influences [77, 78]. Moreover, it has been found that task-demands can override saliency features [78–80]. Thus, it seems that top-down mechanisms (e.g., instructions) dominate gaze behavior during visual tasks (e.g., [54, 80]) and in the performance of visually-guided actions (e.g. [81, 82]). A fairly new and promising approach that tries to overcome the problems of the traditional saliency approach has been suggested by Hwang, Wang, and Pomplun [83]. The authors conducted experiments that combined several interdisciplinary methods in novel ways to examine semantic guidance within a visual scene. This method integrates bottom-up and top-down saliency information, thereby allowing predictions about eye gaze behavior that are presumably closer to the processing mechanisms of the visual system.
In psychiatric research, effort has been taken to investigate fixation distributions over different kinds of stimuli, in order to compare schizophrenic patient groups to healthy subjects (e.g., [84, 85]). Phillips and David [84] were interested in where deluded schizophrenic patients would direct their visual attention to when inspecting images of faces, both familiar and unfamiliar. They showed that schizophrenic patients actively avoided informative regions by mostly fixating areas outside the faces; moreover, in conditions when two faces were presented, fixation durations of deluded patients were prolonged as compared to the non-deluded and the healthy control subjects. Sprenger and colleagues [85] showed photographs of everyday situations to schizophrenic patients. In comparison to healthy control subjects, they found fewer fixation clusters, longer fixation durations as well as deviant attentional landscapes and scan paths.
Characteristics of Saccades
Saccades are necessary to direct the fovea from one point to another. In most visual activities, we perform about three saccades a second [86]. During a saccade, the processing of visual information is suppressed because the image is rapidly moving across the retina [8]. The period where information encoding is suppressed starts before the actual saccade and outlasts the saccadic eye movement by about 50–60 ms [87–89]. In contrast to visual perception, cognitive processing seems not to be interrupted during saccades [90]. Saccades are of high velocities to minimize the periods in which we are nearly blind.
Different types of saccades are documented in the literature. Saccades can be elicited by the onset or change of a visual stimulus, designated as exogenous, reflexive, or visually-guided saccades. Moving the eyes to a target which is recalled from memory requires the performance of an endogenous, voluntary, or memory-guided saccade. These saccades do not necessarily rely on a visual stimulus. During natural viewing, we either perform visually or memory-guided saccades. Another form of saccade, the so-called antisaccade, is often used in neurophysiological research for diagnostic purposes (e.g., [91, 92]). In the antisaccade task, the eyes have to move away from a visual target appearing on the screen. The accurate performance of an antisaccade requires inhibiting a reflexive saccade to the onset location, together with a voluntarily move of the eye in the opposite direction. The antisaccade task requires cognitive control, evidenced by the fact that observers often have difficulties in suppressing the reflexive saccades in the direction of the target. Programming and performing a correct antisaccade is more delayed than visually-guided saccades.
A network of several brain structures is involved in the planning and execution of saccades. Knowledge about the contribution of particular brain structures has been gathered by the investigation of different saccade parameters. In the following, we will discuss commonly analyzed parameters of saccades before briefly reviewing those brain regions which have been identified to significantly contribute to saccadic activity. Saccadic eye movements bring the fovea the regions of interest, which can vary with regard to the distances in between, requiring the saccade amplitudes to be of different lengths. In everyday tasks, saccade amplitudes vary from a few degrees up to 130 degrees of arc, with an average saccade size of about 18–20 degrees [93, 94]. As a result of the variation in saccade length, there are also differences in saccade durations. During reading, saccade durations are on average 20 to 30 ms but they can last up to about 100 ms. The parameter saccadic peak velocity describes the maximum speed that can be achieved within a saccade (up to 900 degrees/s), almost linearly related to the saccade amplitude. For the detection of saccades, when processing the raw data of an eye-tracking device, the saccade acceleration represents another important parameter (to differentiate between other eye movements, a minimum of 150 deg./s−2 is often applied; see, e.g., [95]).
Another feature is the saccade trajectory. Saccades are rarely straight (e.g., [54]) and most of them show a tendency to curve towards the horizontal meridian [96]. Moreover, other objects within the visual scene have been found to influence the magnitude and direction of the curvature observed. The presentation of a distractor has been found to curve a saccade towards a distractor (e.g., [97]) but also away from a distractor (e.g., [98]). The direction of the curvature, i.e., towards or away from a distractor, appears to depend upon the overall neural activity distribution produced by the target and the nearby distractor. According to the population coding theory proposed by Tipper, Howard, and Houghton [99], possible target objects are represented by large neuronal populations that encode a movement vector aimed at the target. If the target and distractor are nearby, their population codes will be combined into one distribution, resulting in a vector which represents an intermediate location between the objects [100].
An often-used parameter, which is mainly of importance for laboratory experiments, is the saccade latency. The latency of a saccade describes the time interval between the appearance of a target and the execution of a saccade towards the target. For healthy adults, saccade latencies are reported within a range of 200 to 250 ms. Saccade latency also seems to be related to cognitive development in children, as the latencies of visually-guided saccades in children are longer than in adults [101]. Obviously, the latency shortens progressively with age. Laboratory experiments have identified a subpopulation of saccades with very short latencies at around 100 ms [102]. The existence and function of these so-called express saccades has been debated, e.g., see [103]. The latency period is necessary to complete several processes, such as attentional disengagement from the actual fixation position, a shift of visual attention to the new target location, and the computation of saccade metrics. Each of these processes involves activation of different cortical and subcortical areas (see below).
The saccade latency represents a cognitive–physiological parameter, and has been extensively studied with different paradigms. The manipulation of information at the fixation location has been found to substantially influence saccade latency. In its simplest form, this manipulation involves a disappearing fixation target before the onset of the next target. The resulting saccade latency is significantly shortened, a phenomenon which is known as the gap effect [104]. In contrast, an increase in the saccadic latency has been found when two stimuli are shown at the same time, one of which is the target, the other a distractor [105, 106]. This “remote distractor effect” has the strongest influence on the saccade latency when the distractor appears at the fixation location [107]. We will elaborate more on the saccade latency when discussing the relationship between fixations and saccades.
In schizophrenia research, saccadic latency is a prominent parameter [108–110]. Manoach and colleagues [109] investigated microstructural integrity of brain structures related to volitional saccades, i.e., anterior cingulate cortex, frontal eye fields, and right hemisphere parietal cortex, using diffusion tensor imaging. Their results suggest that slower volitional saccades in schizophrenic medicated patients are associated with reduced integrity. The relationship between latency and peak velocity in pro- and antisaccades was investigated, both in groups of healthy subjects and in schizophrenia patients [110], and revealed for both groups that antisaccades had lower peak velocities than prosaccades, and that peak velocities of antisaccades were independent of latencies. For prosaccades with long latencies, however, schizophrenia patients showed significant decrease of peak velocities. The authors explained this effect with a possible decay of the transient visual signal at the saccade target, or a reduction of target-related neural activity in the saccade system. Latencies of saccades are also task-dependent. Schwab and colleagues [108] studied schizophrenia patients and their first-degree relatives as compared to healthy subjects in a low and high demand visual task. Their results showed smaller differences between the tasks for the patients, as compared to the other two groups, possibly reflecting a specific oculomotor attentional dysfunction.