Working memory


1 Working memory


Past, present…and future?




Alan Baddeley


Graham Hitch


As the longest serving residents of the province of working memory, the two of us have been invited to introduce this volume by saying a little about our particular experiences of how it all began.


One of us (AD Baddeley – ADB) had cut his research teeth on the 1960s controversy concerning the nature of memory, whether unitary as proposed by the previously dominant stimulus–repsonse (S–R) associationist view with learning based on the building up of associations and forgetting due to interference (Keppel and Underwood 1962; Melton 1963; Postman 1975), or whether it was necessary to assume two or more separate memory systems, interpreted in terms of the new information-processing concepts that came to be known as cognitive psychology (Broadbent 1958; Brown 1958; Miller 1956; Peterson and Peterson 1959). The Medical Research Council Applied Psychology Unit (APU) in Cambridge lay at the heart of this latter approach and our respective mentors, namely Conrad (ADB) and Broadbent (GJ Hitch – GJH) played central roles in the early development of the concept of a separate short-memory (STM). During the 1960s the single versus multiple systems controversy became a very hot issue, with a large number of novel paradigms being developed, and what seemed like an equally large number of explanatory models, many of them mathematically based (for 13 such models see Deutsch and Deutsch 1975). However, as Murdock observed, these tended to have a great deal in common, broadly resembling the most influential example, that of Atkinson and Shiffrin (1968), which Murdock dubbed the modal model.


Lacking even moderate mathematical competence, ADB preferred to concentrate on studying phenomena that might help distinguish between the two hypothetical systems, identifying as possible candidates acoustic versus semantic coding (Baddeley 1966ab), two-component tasks, where one component is durable and attributable to long-term memory (LTM) and the other a more fragile potential STM component (Baddeley 1968a; Glanzier and Cunitz 1966). A third approach resulted from the opportunity to collaborate with Elizabeth Warrington in the study of a group of dense but very pure amnesic patients, who showed preserved performance on tasks associated with STM, despite their grossly impaired LTM (Baddeley and Warrington 1970). As such these provided a double dissociation when contrasted with patients with a lesion in the left temporoparietal lobe who showed impaired STM and preserved LTM (Shallice and Warrington 1970).


GJH in the mean time was completing a PhD on STM, attempting to investigate the mechanism whereby acoustic similarity of items impairs recall. It was well known that acoustic similarity increases order errors (Wickelgren 1965), but unclear how this effect occurs. Experiments comparing different methods of probing STM for recall of a single item suggested that item and order information were stored separately, order being coded positionally rather than through interitem (i.e. chaining) associations (Hitch 1974). However, it was not possible at that stage to link these observations to an account of the acoustic similarity effect. This had to wait a number of years, as will be described later.


(p.2) In 1971, GJH moved from Cambridge to Sussex to work on a grant to study the relationship between STM and LTM with ADB, who had just returned from a year with George Mandler and Donald Norman in California.


Between our application and starting the grant, however, the scene had changed markedly, and STM was no longer a fashionable topic. It is unclear whether this was because the plethora of new experimental paradigms and the plethora of models did not match up, or whether people were beginning to think like Don Norman (personal communication 1971) that we had already solved the problem of STM. It rapidly became clear, however, that Don’s assessment of the situation was a little optimistic.


If we take the Atkinson and Shiffrin (1968) modal model as an example, it was beginning to encounter two major problems. The first of these was the assumption that learning, the transfer of information from the short-term to the long-term store, occurs automatically, with the degree of learning depending on the time in the short-term store (STS). It was becoming increasingly clear that simply maintaining an item in STS does not guarantee learning (Craik and Watkins 1973), consistent with the proposal by Craik and Lockhart (1972) of the Levels of Processing hypothesis, according to which the probability of learning was dependent on depth of processing. Hence with verbal material, requiring the subject to make a judgment about the visual characteristics of the word led to very little learning, judgments of its sound were somewhat better but not nearly as good as processing in terms of its meaning. This issue rapidly eclipsed studies of STM in popularity, and was indeed sometimes claimed to be a refutation of the STM–LTM distinction (Postman 1975), despite the fact that Craik and Lockhart themselves assumed the importance of a short-term component which they termed primary memory.


A second dramatic change in the intellectual scene was prompted by the PhD dissertation of an MIT computer science student that attempted to develop a language comprehension programme. Such a programme needed to store the meaning of the constituent words, and it did so by proposing a hierarchal storage scheme (Quillian 1969). This led to an experimental test, which appeared to support the hypothesis (Collins and Quillian 1969). Although subsequent research identified flaws with this original study (Conrad 1972), the work stimulated intense interest in the important but neglected topic of how meaning is stored. This interest was intensified and crystallized by Tulving’s (1972) classic paper proposing a distinction between knowledge-based semantic memory, and episodic memory, the capacity to recollect specific experiences. We ourselves seemed to have chosen yesterday’s problem as the focus of our first grant.


Paradoxically, a glimmer of hope was provided by a second problem with the modal model. Atkinson and Shiffrin proposed that their short-term store acted as a working memory, performing the wide range of coding and manipulation activities that are needed for complex cognition. The term working memory appears to have been coined by Miller, Galanter and Pribram (1960), although they said little about its nature. Furthermore, although this was an important feature of the Atkinson and Shiffrin model, it was one they had not explored, concentrating instead on producing a mathematically specified model of the retention of lists of unrelated words. Had they attempted to tackle the broader cognitive capacity of their model, they might well have been perturbed by the fact that the patients with STM deficits studied by Shallice and Warrington (1970) appeared to have few problems in their everyday life: indeed one was a very efficient secretary. If the short-term store acted as a working memory, such patients should have been grossly functionally impaired.


We therefore decided to tackle the question of what function the short-term store might serve, if any. We did not have access to appropriate patients, and instead used dual task methods to turn our student subjects into patient substitutes. We did this by requiring them to perform a range of complex cognitive tasks such as reasoning, learning and comprehension, while simultaneously holding and reciting digit sequences ranging in length from zero to eight items. According to the (p.3) modal model, this should lead to a cumulative decrement in performance as digit load increased, until performance was virtually obliterated when the digit sequence reached span. In the case of a syntactic reasoning task, for which performance correlates with intelligence (Baddeley 1968b), completion time increased with load, but even with eight digits it was only 50 per cent higher than the control level, while accuracy remained stable at around 5 per cent regardless of concurrent load. We found similar results for learning and reasoning tasks and concluded that the modal model required revision.


We proposed to replace the concept of a single short-term store with a three-component system comprising an attentional controller, the central executive, aided by two subsidiary systems. The first we christened the articulatory loop, since we proposed that rehearsal involved vocal or subvocal articulation. We have subsequently changed the name to phonological loop, transferring emphasis to the storage system, rather than its means of rehearsal. The second system we initially called the visuospatial scratchpad, changing this in turn to sketchpad on the grounds that a scratchpad could be used for making verbal notes as well as visual sketches.


We began by investigating the two peripheral systems, on the grounds that these appeared to offer a more tractable problem than the central executive, especially in the case of the phonological loop for which extensive evidence already existed based on the standard verbal STM literature.


Things were going well, despite disruption from a move from Sussex to Stirling University, at which point we were invited by Gordon Bower to contribute to a prestigious annual review series he was currently editing. We hesitated, since our model was far from complete, but eventually succumbed to the temptation and submitted the chapter that became Baddeley and Hitch (1974).


By the time the paper was published, we had moved again, back to the APU where ADB had succeeded Broadbent as Director. Although we both moved to Cambridge, our research paths diverged at this point.


Baddeley’s programme



The sketchpad


Work on the visuospatial component of the system was initially prompted by a paper by Atwood (1971), who claimed to find marked differential effects of simple verbal and spatial tasks on whether material was held visually or verbally. In common with a number of other investigators, ADB repeatedly failed to replicate the Atwood effect, although convinced from personal experience that such a distinction existed. A series of ingenious studies by Brooks (19671968) provided a much more robust paradigm that not only identified a reliable effect, but also was able to demonstrate that it was spatial rather than visual in origin (Baddeley, Grant, Wight et al1973; Baddeley and Lieberman 1980). He concluded that the system was spatial in nature, a conclusion that failed to convince a sceptical Scottish post doctoral colleague, Robert Logie, who went on to demonstrate that the system could store either spatial or visual characteristics such as shape and colour, depending on the task (Logie 1986), a line of research he has continued to develop successfully in subsequent years (Logie 1995). Subsequent research using neuroimaging has provided additional evidence for a fractionation of the sketchpad into at least two and possibly more components. The issue of precisely how to characterize these remains controversial. One view links the components to the spatial and object-based visual processing streams (Smith and Jonides 1997) while another view argues for a later post-LTM interpretation (Della Sala and Logie 2002).



The phonological loop


As mentioned earlier, a good deal of existing evidence could be used to develop a simple model of the phonological loop, comprising a temporary store, together with an articulatory rehearsal system (p.4)which can be used either to maintain items through repetition or to register visually presented material within the store by a process of articulatory naming. We were unsure quite what to call the store. Conrad (1964) referred to the system as acoustic, but since visually presented material could be stored, the term acoustic did not seem quite appropriate. The term phonemic was then adopted, but subsequently abandoned when it transpired that this had a rather precise linguistic connotation that did not seem justified either. Phonological was the third attempt; it appears that this also can have a more precise connotation than intended, but there is a limit to the number of times one can change terminology without exhausting the patience of one’s colleagues. No doubt one day someone will specify precisely the nature of the code, but given the lack of success of earlier attempts (Hintzman 1967; Wickelgren 1969), this will not be an easy task.


We concentrated instead on attempting to identify robust phenomena that would place clear constraints on any model, arguing that any progress made in this direction would be valuable in providing a foundation for subsequent theories, even if our own rapidly fell by the wayside. The first of these phenomena, the phonological similarity effect had already been discovered by Conrad (Conrad 1964; Conrad and Hull 1964), and further extended by Baddeley (1966ab) and Wickelgren (1965). It was assumed to reflect the phonological nature of the short-term store. The articulatory rehearsal system on the other hand was assumed to be reflected in the word length effect, the tendency for immediate serial memory span to decline with number of syllables in the words to be remembered (Baddeley, Thomson and Buchanan 1975).


Our initial word length study was almost abandoned on the grounds that any result would have multiple interpretations, but given the robust and strikingly lawful results we obtained, we were able to rule out some at least of the alternative explanations in terms of the semantic, lexical or frequency characteristics of the constituent words. As my colleague Neil Thomson pointed out, the linear relationship between word length, the speed with which the material could be articulated, and span was consistent with the assumption that the memory retrace faded rapidly, becoming inaccessible after about two seconds, unless rehearsed. The subsequent observation by Nicolson (1981) that both memory span and maximal rate of articulation increased systematically with age in young children appeared to offer further support for a simple model involving a fading trace that could be reactivated by subvocal rehearsal, a process that increased in speed and efficiency as the children developed.


Finally, our evidence seemed to support an interpretation in terms of trace decay rather than interference. Our conclusion came from the observation that when our subjects were required to recall sequences of disyllabic words, performance was better for words with short vowels and rapid pronunciation (e.g. bishop, wicket vs harpoon, Friday). Everything seemed to fit neatly into the simple model of a fading trace and a time-based articulatory rehearsal system. Alas, things were not so simple, and although there is till a strong case for thinking that the initial simple model holds (Baddeley, in press), it has since been under continuous attack.


Some apparent criticisms could reasonably be regarded as clarifications: a good example of this is the demonstration by (Cowan, Day, Saults et al1992) that much of the forgetting observed in recalling sequences of long words occurs not only during rehearsal, but also because long words take longer to articulate during recall, as indeed would be predicted by the model. In fact both rehearsal and recall effects of word length can be demonstrated (Baddeley, Chincotta, Stafford et al2002).


A second line of attack has focused on our long and short vowel result, with subsequent studies able to replicate our own results with our material, but showing different results for other sets of apparently equally appropriate material (see Lovatt and Avons 2001 for a review). However, these studies in turn have been criticized for failing to match the relevant groups in terms of phonological similarity (Baddeley and Andrade 1994), a claim denied by Caplan and Walters (1994). The latest (p.5)development in this saga is provided by an extremely careful series of experiments by Mueller, Seymour, Kieras et al2003) who observed that they can give a good account of the data from all the studies in terms of phonological similarity and spoken duration, finding little or no support for the opposing interpretation in terms of word complexity. A further blow to the complexity hypothesis comes from the observation that when long and short words are mixed, no difference in overall recall between the long and short words was found (Cowan, Baddeley, Elliot et al2003; Hulme, Surprenant, Bireta et al2004). The results do not however conform simply to a total spoken time hypothesis, suggesting the need to assume factors other than simple decay, possibly reflecting discriminability of the items at retrieval. This seems likely to be an area of active continuing controversy.


A second area that remained controversial is that of the influence of irrelevant sound on immediate serial recall. In the early 1980s, a French visitor, Pierre Salamé spent a sabbatical year at the APU, tasked with carrying out a one year programme that was unconstrained, except that it had to involve noise. As this was not an active research area at the APU at that time, it was agreed that noise could include the disruptive effect of irrelevant speech. He and ADB agreed an experiment, but disagreed on their predictions. The experiment involved immediate serial recall of sequences of nine digits under quiet conditions, or when ignoring sequences of meaningful words, or sequences of nonsense syllables. One of us (ADB) predicted no effects, Pierre predicted a disruptive effect of irrelevant meaningful words but not syllables. We were both wrong, as we would have known if we had read an earlier paper by Colle (1980). Irrelevant spoken material disrupts recall, regardless of meaning, whereas white noise of a similar loudness does not. Our experiments (Salamé and Baddeley 1982), and a series of subsequent studies extended Colle’s findings and interpreted them within a phonological loop framework, identifying the effect with storage rather than rehearsal. We found an effect of music and of complex meaningless sounds, but not of noise pulsed within the same sound envelope as the irrelevant speech (Salamé and Baddeley 19871989). At this point, Salamé’s laboratory switched direction requiring him to work first on sleep, then on alcohol, and subsequently on schizophrenia, and ADB’s direct involvement in the area paused until reactivated by a new collaboration (Larsen, Baddeley and Andrade 2000; Larsen and Baddeley 2003).


Research on irrelevant sound was carried on by Dylan Jones and his group, who substantially extended the investigation of the acoustic characteristics needed for an irrelevant sound to disrupt serial verbal recall, leading to the development of the changing state hypothesis (Jones 1993) which proposed that only sounds that changed will disrupt memory. Jones (1993) further proposed that serial recall depends on the representation of a chain of items within a multimodal space. The irrelevant sound disrupts recall by providing a competing chain, the object-oriented episodic record (O-OER) hypothesis (Jones 1993). While there is no doubt that the careful study of the nature of the sounds needed to disrupt performance by Jones and his group has made a major contribution, many features of his memory model are questionable, including his assumption of a common verbal/visuospatial system underlying his results. The O-OER model’s assumption of a single chaining mechanism for serial recall is also problematic (see Henson et al1996), as is its failure to address neuropsychological and neuroimaging evidence (see Baddeley, in press). Neath (2000) provides further discussion.



Articulatory suppression


When subjects are required to repeatedly utter an irrelevant word such as “the”, memory span is reduced, and with visual presentation, the phonological similarity effect is removed (Murray 1968). The latter may be attributed to suppression preventing the subvocal naming and recoding of the visually presented material, as when the material is presented auditorily the similarity (p.6) effect returns (Baddeley, Lewis and Vallar 1984). Regardless of presentation modality, suppression removes the word length effect, presumably because it prevents the articulatory rehearsal that is assumed to be the basis of the effect (Baddeley et al1975). Jones also interprets this effect in terms of a changing state hypothesis, although it is not clear why hearing a single repeated word does not substantially disrupt recall (allegedly because it does not form a changing state) whereas repeatedly saying a single word has a major effect (for further discussion see Baddeley, in press; Larsen and Baddeley 2003).


Perhaps the major problem with the initial phonological loop model was the absence of a precise specification, and in particular in its absence of a mechanism for storing and retrieving serial order. Fortunately, as GJH describes, others have done an excellent job in filling this gap.


Hence, although the concept of a phonological loop has engendered continued controversy, in our opinion it remains in good health. The question still arises, however, as to whether it is a sufficiently important component to justify the considerable efforts put into its investigation. Is it anything other than a way of keeping cognitive psychologists busy? We argue that it is, and that it has evolved as a mechanism to facilitate the acquisition of language, applying it both to first language and to later language learning, a position discussed extensively by Baddeley, Gathercole and Papagno (1998), who present evidence for impaired vocabulary learning when the phonological loop is limited either neuropsychologically (Baddeley, Papagno and Vallar 1988), developmentally (Gathercole and Baddeley 1989), by the nature of the material (Papagno and Vallar 1992), or by concurrent articulatory suppression (Papagno, Valentine and Baddeley 1991). It is also becoming clear that the loop can play a valuable role in the control of behavior, providing a reliable, attentionally undemanding if limited source of self-instruction (Baddeley, Chincotta and Adlam 2001; Luria 1959ab; Miyake, Emerson, Padilla et al2004; Saeki and Saito 2004).



The central executive


Some ten years after the initial Baddeley and Hitch paper, ADB began a monograph that attempted to pull together the work that had been done in that decade (Baddeley 1986). On nearing the end of the book, it became embarrassingly clear that he had made no progress on the all-important central executive, other than to use it as a convenient homunculus that could take care of all the complicated issues that appeared to be beyond the remit of the two subsystems. Like Attneave (1960), we think that homunculi can be useful chaps, but only if kept in their place. They can be valuable markers for problems that are yet to be solved, but this is only helpful if the problems are identified and in due course tackled, gradually reducing the remit of the homunculus, who it is hoped will eventually retire. In an attempt to identify the jobs being performed by the homunculus, a chapter on the central executive was added.


An obvious place to search for inspiration for a possible model was in the literature on attention. Although this was a well-developed field, the vast majority of work concerned the role of selective attention in perception. At the time, only one model appeared to tackle the role of attention in the control of action, that of Norman and Shallice (1986). This approach divided the control of action into two distinct mechanisms, one of which was automatic and dependent on existing habits and schemata, the other was an attentionally demanding system for modifying and controlling action when the habit-based system failed, either through overload, or from encountering a novel problem. The latter system which they labelled the Supervisory Attentional System (SAS) was adopted as a possible model for the central executive component of working memory (Baddeley 1986). It still gives a very good general account of the two principal ways in which action is controlled (Baddeley, 2007).


Analysing the SAS is therefore dependent on analysing attentional control, no mean task. One way into the problem is to attempt to go back to first principles, deciding what characteristics (p.7) virtually any attentional control system would require, and then searching for evidence of these in working memory. Baddeley (1996) proposed four basic functions – the need to focus attention, to divide attention across different sources, to switch attention between tasks, and to use attention to link working memory with LTM.


Evidence of the focusing of attention is abundant. One example is provided by the task of retaining a chess position, or selecting the optimal next move. Whereas articulatory suppression has no effect on performance, suggesting no verbal contribution, a concurrent attentionally demanding task such as random generation impairs positional memory and has an even greater effect on move selection (Robbins, Anderson, Barker et al1996; Saariluoma 1995).


Divided attention was studied principally through research on Alzheimer’s disease, where patients proved to have disproportionate difficulty in dividing attention between tasks, a deficit not present in normal elderly people (Baddeley, Bressi, Della Sala et al1991; Logie, Cocchini, Della Sala et al2004). Evidence for a specific capacity to switch attention between two tasks proved more elusive. One series of studies (Baddeley, Chincotta and Adlam 2001) found that the phonological loop was playing a greater role than the executive in switching in the case of one task, while switching may even enhance performance under certain conditions. For example, the speed of repeatedly writing a single letter (aaaa…) is slower than writing a series of different letters (abcabc…) as found by Nohara (1965) and replicated by Wing, Lewis and Baddeley (1979). It seems likely that switching may be achieved in a number of different ways, only some of which will depend on executive resources (Schumacher, Seymour, Glass et al2001).



The episodic buffer


The search for the process whereby working memory and LTM interact also proved complex. The situation was further complicated by the attempt to treat the executive as a purely attentional system with no storage capacity (Baddeley and Logie 1999). This resulted in a range of problems, issues that could not be handled readily within the existing three-component model. The issue was brought to a head by the attempt to account for the memory performance of a single patient, a densely amnesic but intellectually well-preserved man who had a capacity to perform normally on the immediate recall of a prose passage comprising some 25 different idea units, greatly exceeding the assumed capacity of any of the components of working memory (Wilson and Baddeley 1988), followed by the discovery of a small number of similar patients who appeared to be able to use a high level of preserved intelligence to temporarily maintain a prose paragraph (Baddeley and Wilson 2002). This highlighted the fact that the three-component system also failed to give a good account of recall of prose in normal subjects, and of the crucial phenomenon of chunking, in which existing knowledge is used to increase span (Miller 1956). In order to cope with these and a range of other problems, a fourth component of working memory was proposed: the episodic buffer (Baddeley 2000).


The episodic buffer was assumed to be a limited capacity attentional storage system based on a multidimensional code. Control is exercised by the central executive and retrieval via conscious awareness. It is episodic in that it stores episodes of information that are chunked into higher-order representations. It is a buffer in that it is a limited-capacity system which by its multidimensional nature is capable of providing a link between the more specialized subsystems of working memory.


In the few years since the concept was proposed, it appears to have evoked a reasonable amount of interest. It certainly has the advantage of providing a bridge between the European approach to working memory which has typically been bottom-up, attempting to begin by studying the simpler subsystems, and the predominant North American approach, which tends towards a more top-down emphasis on executive control. The crucial question remains however, whether the concept can (p.8)really earn its living by generating fruitful experiments that genuinely enhance our knowledge of working memory, or whether it will simply become a handy method of placating editors who want more theoretical speculation in the discussion section of empirical papers.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 10, 2017 | Posted by in NEUROLOGY | Comments Off on Working memory

Full access? Get Clinical Tree

Get Clinical Tree app for offline access